HOTA: A Higher Order Metric for Evaluating Multi-object Tracking

Multi-Object Tracking (MOT) has been notoriously difficult to evaluate. Previous metrics overemphasize the importance of either detection or association. To address this, we present a novel MOT evaluation metric, HOTA (Higher Order Tracking Accuracy), which explicitly balances the effect of performing accurate detection, association and localization into a single unified metric for comparing trackers. HOTA decomposes into a family of sub-metrics which are able to evaluate each of five basic error types separately, which enables clear analysis of tracking performance. We evaluate the effectiveness of HOTA on the MOTChallenge benchmark, and show that it is able to capture important aspects of MOT performance not previously taken into account by established metrics. Furthermore, we show HOTA scores better align with human visual evaluation of tracking performance.

[1]  Jean-Marc Odobez,et al.  Evaluating Multi-Object Tracking , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Workshops.

[2]  Yang Zhang,et al.  Heterogeneous Association Graph Fusion for Target Association in Multiple Object Tracking , 2019, IEEE Transactions on Circuits and Systems for Video Technology.

[3]  Hua Yang,et al.  Online Multi-Object Tracking with Dual Matching Attention Networks , 2018, ECCV.

[4]  Andrea Cavallaro,et al.  A Predictor of Moving Objects for First-Person Vision , 2019, 2019 IEEE International Conference on Image Processing (ICIP).

[5]  Bodo Rosenhahn,et al.  Fusion of Head and Full-Body Detectors for Multi-object Tracking , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[6]  Wenhan Luo,et al.  Multiple object tracking: A literature review , 2014, Artif. Intell..

[7]  P. Smith,et al.  A branching algorithm for discriminating and tracking multiple objects , 1975 .

[8]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[9]  Pascal Fua,et al.  Non-Markovian Globally Consistent Multi-object Tracking , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[10]  Haibin Ling,et al.  FAMNet: Joint Learning of Feature, Affinity and Multi-Dimensional Assignment for Online Multiple Object Tracking , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[11]  Andreas Geiger,et al.  MOTS: Multi-Object Tracking and Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Ramakant Nevatia,et al.  Tracking of Multiple, Partially Occluded Humans based on Static Body Part Detection , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[13]  Euntai Kim,et al.  Multiple Object Tracking via Feature Pyramid Siamese Networks , 2019, IEEE Access.

[14]  A. Ellis,et al.  PETS2010 and PETS2009 Evaluation of Results Using Individual Ground Truthed Single Views , 2010, 2010 7th IEEE International Conference on Advanced Video and Signal Based Surveillance.

[15]  Donald Reid An algorithm for tracking multiple targets , 1978 .

[16]  Konrad Schindler,et al.  Challenges of Ground Truth Evaluation of Multi-target Tracking , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[17]  Carsten Rother,et al.  Panoptic Segmentation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Volker Eiselein,et al.  Sequential sensor fusion combining probability hypothesis density and kernelized correlation filters for multi-object tracking in video data , 2017, 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[19]  Rainer Stiefelhagen,et al.  The CLEAR 2006 Evaluation , 2006, CLEAR.

[20]  José Bento,et al.  A metric for sets of trajectories that is practical and mathematically consistent , 2016, ArXiv.

[21]  Thomas Brox,et al.  Motion Segmentation & Multiple Object Tracking by Correlation Co-Clustering , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  J. Stein,et al.  Generalized Correlation of Multi-Target Track Data , 1975, IEEE Transactions on Aerospace and Electronic Systems.

[23]  Deyu Meng,et al.  The Solution Path Algorithm for Identity-Aware Multi-object Tracking , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Rainer Stiefelhagen,et al.  Evaluating Multiple Object Tracking Performance: The CLEAR MOT Metrics , 2008, EURASIP J. Image Video Process..

[25]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[26]  Daniel Cremers,et al.  Tracking the Trackers: An Analysis of the State of the Art in Multiple Object Tracking , 2017, ArXiv.

[27]  Han Wang,et al.  Multiple Object Tracking With Attention to Appearance, Structure, Motion and Size , 2019, IEEE Access.

[28]  Dima Damen,et al.  Recognizing linked events: Searching the space of feasible explanations , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Andrew M. Wallace,et al.  Development of a N-type GM-PHD Filter for Multiple Target, Multiple Type Visual Tracking , 2019, J. Vis. Commun. Image Represent..

[30]  Gerhard Rigoll,et al.  A dual CNN-RNN for multiple people tracking , 2019, Neurocomputing.

[31]  Stefan Roth,et al.  MOTChallenge 2015: Towards a Benchmark for Multi-Target Tracking , 2015, ArXiv.

[32]  James M. Rehg,et al.  Multi-object Tracking with Neural Gating Using Bilinear LSTM , 2018, ECCV.

[33]  Bernt Schiele,et al.  PoseTrack: A Benchmark for Human Pose Estimation and Tracking , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[34]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Nathanael L. Baisa Online Multi-target Visual Tracking using a HISP Filter , 2018, VISIGRAPP.

[36]  Nathanael L. Baisa Online Multi-object Visual Tracking using a GM-PHD Filter with Deep Appearance Learning , 2019, 2019 22th International Conference on Information Fusion (FUSION).

[37]  Long Chen,et al.  Aggregate Tracklet Appearance Features for Multi-Object Tracking , 2019, IEEE Signal Processing Letters.

[38]  Ángel F. García-Fernández,et al.  A Metric on the Space of Finite Sets of Trajectories for Evaluation of Multi-Target Tracking Algorithms , 2016, IEEE Transactions on Signal Processing.

[39]  Eyal Krupka,et al.  Monotonicity and error type differentiability in performance measures for target detection and tracking in video , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[40]  Pascal Fua,et al.  Eliminating Exposure Bias and Metric Mismatch in Multiple Object Tracking , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[42]  Young-min Song,et al.  Online multiple object tracking with the hierarchically adopted GM-PHD filter using motion and appearance , 2016, 2016 IEEE International Conference on Consumer Electronics-Asia (ICCE-Asia).

[43]  Philip H. S. Torr,et al.  Video Instance Segmentation 2019: A Winning Approach for Combined Detection, Segmentation, Classification and Tracking. , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[44]  Fabio Tozeto Ramos,et al.  Simple online and realtime tracking , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[45]  Qiang Xu,et al.  nuScenes: A Multimodal Dataset for Autonomous Driving , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Bastian Leibe,et al.  Track to Reconstruct and Reconstruct to Track , 2020, IEEE Robotics and Automation Letters.

[47]  Volker Eiselein,et al.  Real-Time Multi-human Tracking Using a Probability Hypothesis Density Filter and Multiple Detectors , 2012, 2012 IEEE Ninth International Conference on Advanced Video and Signal-Based Surveillance.

[48]  Daniel Cremers,et al.  CVPR19 Tracking and Detection Challenge: How crowded can it get? , 2019, ArXiv.

[49]  Alexander H. Waibel,et al.  Computers in the Human Interaction Loop , 2009, Handbook of Ambient Intelligence and Smart Environments.

[50]  Bodo Rosenhahn,et al.  Multiple People Tracking Using Body and Joint Detections , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[51]  James M. Rehg,et al.  Multiple Hypothesis Tracking Revisited , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[52]  Jonathon A. Chambers,et al.  Multi-Level Cooperative Fusion of GM-PHD Filters for Online Multiple Human Tracking , 2019, IEEE Transactions on Multimedia.

[53]  K. Kao Edward,et al.  An information theoretic approach for tracker performance evaluation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[54]  Kim B. Housewright,et al.  Derivation and evaluation of improved tracking filter for use in dense multitarget environments , 1974, IEEE Trans. Inf. Theory.

[55]  Jean Carletta,et al.  The AMI Meeting Corpus: A Pre-announcement , 2005, MLMI.

[56]  Osamu Fujita,et al.  Metrics based on average distance between sets , 2011, Japan Journal of Industrial and Applied Mathematics.

[57]  Daniel Cremers,et al.  MOT20: A benchmark for multi object tracking in crowded scenes , 2020, ArXiv.

[58]  Stefan Roth,et al.  MOT16: A Benchmark for Multi-Object Tracking , 2016, ArXiv.

[59]  Rachel Bowers,et al.  PETS vs . VACE Evaluation Programs : A Comparative Study , 2006 .

[60]  Neil D. Lawrence,et al.  International workshop on machine learning for multimodal interaction , 2007 .

[61]  J.M. Ferryman,et al.  PETS Metrics: On-Line Performance Evaluation Service , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[62]  Yu Tsao,et al.  Track-Clustering Error Evaluation for Track-Based Multi-camera Tracking System Employing Human Re-identification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[63]  Volker Eiselein,et al.  High-Speed tracking-by-detection without using image information , 2017, 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[64]  Laura Leal-Taixé,et al.  Tracking Without Bells and Whistles , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[65]  Yang Zhang,et al.  Enhancing Detection Model for Multiple Hypothesis Tracking , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[66]  Yue Cao,et al.  Spatial-Temporal Relation Networks for Multi-Object Tracking , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[67]  Ming-Hsuan Yang,et al.  UA-DETRAC: A new benchmark and protocol for multi-object detection and tracking , 2015, Comput. Vis. Image Underst..

[68]  François Brémond,et al.  ETISEO, performance evaluation for video surveillance systems , 2007, 2007 IEEE Conference on Advanced Video and Signal Based Surveillance.

[69]  Jenq-Neng Hwang,et al.  Exploit the Connectivity: Multi-Object Tracking with TrackletNet , 2018, ACM Multimedia.

[70]  Haibin Ling,et al.  Vision Meets Drones: Past, Present and Future , 2020, ArXiv.

[71]  Jianren Wang,et al.  3D Multi-Object Tracking: A Baseline and New Evaluation Metrics , 2019 .

[72]  Kwangjin Yoon,et al.  Online Multi-Object Tracking with Historical Appearance Matching and Scene Adaptive Detection Filtering , 2018, 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[73]  Majid Mirmehdi,et al.  A Dataset for Persistent Multi-target Multi-camera Tracking in RGB-D , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[74]  Deva Ramanan,et al.  TAO: A Large-Scale Benchmark for Tracking Any Object , 2020, ECCV.

[75]  Francesco Solera,et al.  Performance Measures and a Data Set for Multi-target, Multi-camera Tracking , 2016, ECCV Workshops.

[76]  Kwangjin Yoon,et al.  Online Multi-Object Tracking With GMPHD Filter and Occlusion Group Management , 2019, IEEE Access.

[77]  Pascal Fua,et al.  Tracking multiple people under global appearance constraints , 2011, 2011 International Conference on Computer Vision.

[78]  Jeonghwan Gwak,et al.  OneShotDA: Online Multi-Object Tracker With One-Shot-Learning-Based Data Association , 2020, IEEE Access.

[79]  Yuchen Fan,et al.  Video Instance Segmentation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[80]  Trevor Darrell,et al.  BDD100K: A Diverse Driving Video Database with Scalable Annotation Tooling , 2018, ArXiv.

[81]  John S. Garofolo,et al.  PERFORMANCE EVALUATION PROTOCOL FOR FACE, PERSON AND VEHICLE DETECTION & TRACKING IN VIDEO ANALYSIS AND CONTENT EXTRACTION (VACE-II) CLEAR - CLASSIFICATION OF EVENTS, ACTIVITIES AND RELATIONSHIPS , 2006 .

[82]  Dragomir Anguelov,et al.  Scalability in Perception for Autonomous Driving: An Open Dataset Benchmark , 2019 .

[83]  Xiya Zhang,et al.  PANDA: A Gigapixel-Level Human-Centric Video Dataset , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[84]  Simon Lucey,et al.  Argoverse: 3D Tracking and Forecasting With Rich Maps , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[85]  Zeyu Fu,et al.  Particle PHD Filter Based Multiple Human Tracking Using Online Group-Structured Dictionary Learning , 2018, IEEE Access.

[86]  Nenghai Yu,et al.  Real-Time Online Multi-Object Tracking in Compressed Domain , 2022, IEEE Access.

[87]  Dragomir Anguelov,et al.  Scalability in Perception for Autonomous Driving: Waymo Open Dataset , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[88]  Seung-Hwan Bae,et al.  Learning Discriminative Appearance Models for Online Multi-Object Tracking With Appearance Discriminability Measures , 2018, IEEE Access.

[89]  Fabio Poiesi,et al.  Online Multi-target Tracking with Strong and Weak Detections , 2016, ECCV Workshops.

[90]  Laura Leal-Taix'e,et al.  Learning a Neural Solver for Multiple Object Tracking , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[91]  Long Chen,et al.  Real-Time Multiple People Tracking with Deeply Learned Candidate Selection and Person Re-Identification , 2018, 2018 IEEE International Conference on Multimedia and Expo (ICME).

[92]  David C. Anastasiu,et al.  The NVIDIA AI City Challenge , 2017, 2017 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computed, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI).

[93]  Ram Nevatia,et al.  Learning to associate: HybridBoosted multi-target tracker for crowded scene , 2009, CVPR.