PolarMOT: How Far Can Geometric Relations Take Us in 3D Multi-Object Tracking?

TLDR: State-of-the-art generalizable multi-object tracking posed as edge classfication on a continuously evolved temporal multiplex graph, which contains only pairwise geometric relationships between objects (temporal and spatial) as its initial features

Focus on object interactions and influences, without object information, e.g. appearance

Light graph neural network made of 3/4-layer MLPs with fully connected layers (only 71k parameters).
It is capable of near-real-time inference: 33 FPS on NuScenes and 170 FPS on KITTI

Polar parametrization of features enables better generalization across datasets and cities without re-training.
It also allows the model to perform well when trained only on 1% data

Features are time normalized to help handle occlusions and gaps

Paper                   Code


    author={Kim, Aleksandr and Bras{\'o}, Guillem and O{\v{s}}ep, Aljo{\v{s}}a and Leal-Taix{\'e}, Laura},
    title={PolarMOT: How Far Can Geometric Relations Take Us in 3D Multi-Object Tracking?}, 
    booktitle={European Conference on Computer Vision (ECCV) 2022},
    publisher={Springer Nature Switzerland},
    doi = {10.1007/978-3-031-20047-2_3},