Multiple Object Tracking Performance Metrics and Evaluation in a Smart Room Environment

Simultaneous tracking of multiple persons in real world environments is an active research field and several approaches have been proposed, based on a variety of features and algorithms. Recently, there has been a growing interest in organizing systematic evaluations to compare the various techniques. Unfortunately, the lack of common metrics for measuring the performance of multiple object trackers still makes it hard to compare their results. In this work, we introduce two intuitive and general metrics to allow for objective comparison of tracker characteristics, focusing on their precision in estimating object locations, their accuracy in recognizing object configurations and their ability to consistently label objects over time. We also present a novel system for tracking multiple users in a smart room environment using several cameras, based on color histogram tracking of person regions and automatic initialization using special object detectors. This system is used to demonstrate the expressiveness of the proposed metrics through a sample performance evaluation using real test video sequences of people interacting in the

[1]  Jean-Marc Odobez,et al.  Evaluating Multi-Object Tracking , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Workshops.

[2]  Rainer Stiefelhagen,et al.  Multi-view head pose estimation using neural networks , 2005, The 2nd Canadian Conference on Computer and Robot Vision (CRV'05).

[3]  Hai Tao,et al.  A Sampling Algorithm for Tracking Multiple Objects , 1999, Workshop on Vision Algorithms.

[4]  Rainer Stiefelhagen,et al.  Towards vision-based 3-D people tracking in a smart room , 2002, Proceedings. Fourth IEEE International Conference on Multimodal Interfaces.

[5]  Larry S. Davis,et al.  W/sup 4/: Who? When? Where? What? A real time system for detecting and tracking people , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[6]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[7]  Larry S. Davis,et al.  M2Tracker: A Multi-view Approach to Segmenting and Tracking People in a Cluttered Scene Using Region-Based Stereo , 2002, ECCV.

[8]  Rainer Lienhart,et al.  An extended set of Haar-like features for rapid object detection , 2002, Proceedings. International Conference on Image Processing.

[9]  Alex Pentland,et al.  Pfinder: real-time tracking of the human body , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[10]  S. Intille,et al.  Improving Multiple People Tracking Using Temporal Consistency , .

[11]  John W. McDonough,et al.  A joint particle filter for audio-visual speaker tracking , 2005, ICMI '05.

[12]  Yuan-Fang Wang,et al.  Real-time multiperson tracking in video surveillance , 2003, Fourth International Conference on Information, Communications and Signal Processing, 2003 and the Fourth Pacific Rim Conference on Multimedia. Proceedings of the 2003 Joint.

[13]  Rainer Stiefelhagen,et al.  Pointing gesture recognition based on 3D-tracking of face, hands and head orientation , 2003, ICMI '03.

[14]  Trevor Darrell,et al.  A Probabilistic Framework for Multi-modal Multi-Person Tracking , 2003, 2003 Conference on Computer Vision and Pattern Recognition Workshop.

[15]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..