Rana Mostafa, Hoda Baraka and AbdelMoniem Bayoumi

**LMOT**, i.e., Light-weight Multi-Object Tracker,  performs joint pedestrian detection and tracking. LMOT introduces a simplified DLA-34 encoder network to extract detection features for the current image that are computationally efficient. Furthermore, we generate efficient tracking features using a linear transformer for the prior image frame and its corresponding detection heatmap. After that, LMOT fuses both detection and tracking feature maps in a multi-layer scheme and performs a two-stage online data association relying on the Kalman filter to generate tracklets. We evaluated our model on the challenging real-world MOT16/17/20 datasets, showing LMOT significantly outperforms the state-of-the-art trackers concerning runtime while maintaining high robustness. LMOT is approximately ten times faster than state-of-the-art trackers while being only 3.8% behind in performance accuracy on average leading to a much computationally lighter model.

Code: https://github.com/RanaMostafaAbdElMohsen/LMOT
Paper: https://doi.org/10.1109/ACCESS.2022.3197157

**FCOS** is an anchor-box free, proposal free, single-stage object detection model. By eliminating the predefined set of anchor boxes, FCOS avoids computation related to anchor boxes such as calculating overlapping during training. It also avoids all hyper-parameters related to anchor boxes, which are often very sensitive to the final detection performance.

FCOS

FCOS: Fully Convolutional One-Stage Object Detection

LMOT

**Siren**, or **Sinusoidal Representation Network**, is a periodic activation function for implicit neural representations. Specifically it uses the sine as a periodic activation function:

$$ \Phi\left(x\right) = \textbf{W}\_{n}\left(\phi\_{n-1} \circ \phi\_{n-2} \circ \dots \circ \phi\_{0} \right) $$

Year	2000
Data Source	CC BY-SA - https://paperswithcode.com