Evaluating Multiple Object Tracking Performance: The CLEAR MOT Metrics

Simultaneous tracking of multiple persons in real-world environments is an active research field and several approaches have been proposed, based on a variety of features and algorithms. Recently, there has been a growing interest in organizing systematic evaluations to compare the various techniques. Unfortunately, the lack of common metrics for measuring the performance of multiple object trackers still makes it hard to compare their results. In this work, we introduce two intuitive and general metrics to allow for objective comparison of tracker characteristics, focusing on their precision in estimating object locations, their accuracy in recognizing object configurations and their ability to consistently label objects over time. These metrics have been extensively used in two large-scale international evaluations, the 2006 and 2007 CLEAR evaluations, to measure and compare the performance of multiple object trackers for a wide variety of tracking tasks. Selected performance results are presented and the advantages and drawbacks of the presented metrics are discussed based on the experience gained during the evaluations.

[1]  J. Munkres ALGORITHMS FOR THE ASSIGNMENT AND TRANSIORTATION tROBLEMS* , 1957 .

[2]  Hai Tao,et al.  A Sampling Algorithm for Tracking Multiple Objects , 1999, Workshop on Vision Algorithms.

[3]  P. Jonathon Phillips,et al.  Empirical Evaluation Methods in Computer Vision , 2002 .

[4]  Larry S. Davis,et al.  M2Tracker: A Multi-view Approach to Segmenting and Tracking People in a Cluttered Scene Using Region-Based Stereo , 2002, ECCV.

[5]  Trevor Darrell,et al.  A Probabilistic Framework for Multi-modal Multi-Person Tracking , 2003, 2003 Conference on Computer Vision and Pattern Recognition Workshop.

[6]  Yan Li,et al.  Evaluating the performance of systems for tracking football players and ball , 2005, IEEE Conference on Advanced Video and Signal Based Surveillance, 2005..

[7]  J. Crowley,et al.  CAVIAR Context Aware Vision using Image-based Active Recognition , 2005 .

[8]  Jean-Marc Odobez,et al.  Evaluating Multi-Object Tracking , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Workshops.

[9]  Andrea Cavallaro,et al.  Performance evaluation of event detection solutions: the CREDS experience , 2005, IEEE Conference on Advanced Video and Signal Based Surveillance, 2005..

[10]  Rainer Stiefelhagen,et al.  Multi-view head pose estimation using neural networks , 2005, The 2nd Canadian Conference on Computer and Robot Vision (CRV'05).

[11]  Alexander H. Waibel CHIL - Computers in the Human Interaction Loop , 2005, MVA.

[12]  John W. McDonough,et al.  A joint particle filter for audio-visual speaker tracking , 2005, ICMI '05.

[13]  Rainer Stiefelhagen,et al.  Multimodal Technologies for Perception of Humans: First International Evaluation Workshop on Classification of Events, Activities and Relationships, CLEAR ... Papers (Lecture Notes in Computer Science) , 2007 .

[14]  François Brémond,et al.  ETISEO, performance evaluation for video surveillance systems , 2007, 2007 IEEE Conference on Advanced Video and Signal Based Surveillance.

[15]  Charles L. Smith NIST National Institute of Standards and Technology Small Business Innovation Research ( SBIR ) A Marketing Survey of Civil Federal Government Organizations to Determine the Need for a Role-Based Access Control ( RBAC ) Security Product SETA , 2008 .

[16]  Jonathan G. Fiscus,et al.  Multimodal Technologies for Perception of Humans, International Evaluation Workshops CLEAR 2007 and RT 2007, Baltimore, MD, USA, May 8-11, 2007, Revised Selected Papers , 2008, CLEAR.

[17]  S. Intille,et al.  Improving Multiple People Tracking Using Temporal Consistency , .