Analysis of object description methods in a video object tracking environment

A key issue in video object tracking is the representation of the objects and how effectively it discriminates between different objects. Several techniques have been proposed, but without a generally accepted method. While analysis and comparisons of these individual methods have been presented in the literature, their evaluation as part of a global solution has been overlooked. The appearance model for the objects is a component of a video object tracking framework, depending on previous processing stages and affecting those that succeed it. As a result, these interdependencies should be taken into account when analysing the performance of the object description techniques. We propose an integrated analysis of object descriptors and appearance models through their comparison in a common object tracking solution. The goal is to contribute to a better understanding of object description methods and their impact on the tracking process. Our contributions are threefold: propose a novel descriptor evaluation and characterisation paradigm; perform the first integrated analysis of state-of-the-art description methods in a scenario of people tracking; put forward some ideas for appearance models to use in this context. This work provides foundations for future tests and the proposed assessment approach contributes to the informed selection of techniques more adequately for a given tracking application context.

[1]  Robert B. Fisher,et al.  CVML - an XML-based computer vision markup language , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[2]  SchmidCordelia,et al.  A Performance Evaluation of Local Descriptors , 2005 .

[3]  Ming-Hsuan Yang,et al.  Online visual tracking with histograms and articulating blocks , 2010, Comput. Vis. Image Underst..

[4]  Nicolae Vizireanu,et al.  Generalizations of binary morphological shape decomposition , 2007, J. Electronic Imaging.

[5]  Stefan Carlsson,et al.  Combining Appearance and Topology for Wide Baseline Matching , 2002, ECCV.

[6]  Cordelia Schmid,et al.  Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.

[7]  James E. Black,et al.  A novel method for video tracking performance evaluation , 2003 .

[8]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[9]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[10]  Thomas B. Moeslund,et al.  A Survey of Computer Vision-Based Human Motion Capture , 2001, Comput. Vis. Image Underst..

[11]  Hongli Deng,et al.  Performance evaluation of an intelligent video surveillance system - A case study , 2010, Comput. Vis. Image Underst..

[12]  Andrew Zisserman,et al.  Incremental learning of object detectors using a visual shape alphabet , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[13]  Cordelia Schmid,et al.  A Performance Evaluation of Local Descriptors , 2005, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Qixiang Ye,et al.  Combined feature evaluation for adaptive visual object tracking , 2011, Comput. Vis. Image Underst..

[15]  Matthew A. Brown,et al.  Invariant Features from Interest Point Groups , 2002, BMVC.

[16]  Ramakant Nevatia,et al.  Segmentation and tracking of multiple humans in complex situations , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[17]  Yan Ke,et al.  PCA-SIFT: a more distinctive representation for local image descriptors , 2004, CVPR 2004.

[18]  Tim Ellis Performance metrics and methods for tracking in surveillance , 2002 .

[19]  Richard Szeliski,et al.  Computer Vision - Algorithms and Applications , 2011, Texts in Computer Science.

[20]  Jaime S. Cardoso,et al.  Automatic description of object appearances in a wide-area surveillance scenario , 2012, 2012 19th IEEE International Conference on Image Processing.

[21]  Jaime S. Cardoso,et al.  Partition-distance methods for assessing spatial segmentations of images and videos , 2009, Comput. Vis. Image Underst..

[22]  Ramakant Nevatia,et al.  Tracking multiple humans in complex situations , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  A. Murat Tekalp,et al.  Performance measures for video object segmentation and tracking , 2003, IEEE Transactions on Image Processing.

[24]  Alexei A. Efros,et al.  Discovering objects and their location in images , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[25]  Cordelia Schmid,et al.  Evaluation of Interest Point Detectors , 2000, International Journal of Computer Vision.

[26]  Dimitrios Makris,et al.  An object-based comparative methodology for motion detection based on the F-Measure , 2008, Comput. Vis. Image Underst..

[27]  Hai Tao,et al.  Object tracking with dynamic feature graph , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[28]  Fatih Porikli,et al.  Performance Evaluation of Object Detection and Tracking Systems , 2006 .

[29]  Simona Halunga,et al.  Morphological skeleton decomposition interframe interpolation method , 2010, J. Electronic Imaging.

[30]  Julius T. Tou Feature extraction in pattern recognition , 1968, Pattern Recognit..

[31]  Subhash Challa,et al.  Multiple Pedestrian Tracking Using Colour and Motion Models , 2010, 2010 International Conference on Digital Image Computing: Techniques and Applications.

[32]  Sridha Sridharan,et al.  Dynamic Performance Measures for Object Tracking Systems , 2009, 2009 Sixth IEEE International Conference on Advanced Video and Signal Based Surveillance.

[33]  Horst Bischof,et al.  Performance evaluation metrics for motion detection and tracking , 2004, ICPR 2004.

[34]  Jaime S. Cardoso,et al.  Hybrid framework for evaluating video object tracking algorithms , 2010 .

[35]  Rafael Bastos FIRST - Fast Invariant to Rotation and Scale Transform: Invariant Image Features for Augmented Reality and Computer Vision , 2009 .

[36]  Hongbin Zha,et al.  Robust human tracking based on multi-cue integration and mean-shift , 2009, Pattern Recognit. Lett..

[37]  Gary Bradski,et al.  Computer Vision Face Tracking For Use in a Perceptual User Interface , 1998 .

[38]  Jaime S. Cardoso,et al.  Filling the gap in quality assessment of video object tracking , 2012, Image Vis. Comput..

[39]  Chong-Wah Ngo,et al.  Representations of Keypoint-Based Semantic Concept Detection: A Comprehensive Study , 2010, IEEE Transactions on Multimedia.

[40]  Pierre Vandergheynst,et al.  Cascade of descriptors to detect and track objects across any network of cameras , 2010, Comput. Vis. Image Underst..

[41]  François Brémond,et al.  ETISEO, performance evaluation for video surveillance systems , 2007, 2007 IEEE Conference on Advanced Video and Signal Based Surveillance.

[42]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[43]  Jaime S. Cardoso,et al.  Object Segmentation Using Background Modelling and Cascaded Change Detection , 2007, J. Multim..

[44]  Huiyu Zhou,et al.  Object tracking using SIFT features and mean shift , 2009, Comput. Vis. Image Underst..

[45]  David Suter,et al.  Assessing the performance of corner detectors for point feature tracking applications , 2004, Image Vis. Comput..

[46]  James Ferryman,et al.  Proceedings of the thirteenth IEEE International Workshop on Performance Evaluation of Tracking and Surveillance , 2009 .

[47]  Nenghai Yu,et al.  Scale-Invariant Visual Language Modeling for Object Categorization , 2009, IEEE Trans. Multim..