Leveraging single for multi-target tracking using a novel trajectory overlap affinity measure

Multi-target tracking (MTT) is the task of localizing objects of interest in a video and associating them through time. Accurate affinity measures between object detections is crucial for MTT. Previous methods use simple affinity measures, based on heuristics, that are unable to track through occlusions and missing detections. To address this problem, this paper proposes a novel affinity measure by leveraging the power of single-target visual tracking (VT), which has proven reliable to locally track objects of interest given a bounding-box initialization. In particular, given two detections at different frames, we perform VT starting from each of them and towards the frame of the other. We then learn a metric with features extracted from the behaviours (e.g. overlaps and distances) of the two tracking trajectories. By plugging our learned affinity into the standard MTT framework, we are able to cope with occlusions and large amounts of missing or inaccurate detections. We evaluate our method on public datasets, including the popular MOT benchmark, and show improvements over previously published methods.

[1]  L. Gool,et al.  Appearances Can Be Deceiving: Learning Visual Tracking from Few Trajectory Annotations , 2014, ECCV.

[2]  Antonio Torralba,et al.  LabelMe video: Building a video database with human annotations , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[3]  Konrad Schindler,et al.  Multi-target tracking by continuous energy minimization , 2011, CVPR 2011.

[4]  Yanxi Liu,et al.  Tracking Sports Players with Context-Conditioned Motion Models , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Mario Sznaier,et al.  The Way They Move: Tracking Multiple Targets with Similar Appearance , 2013, 2013 IEEE International Conference on Computer Vision.

[6]  Shaogang Gong,et al.  A Markov Clustering Topic Model for mining behaviour in video , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[7]  Ramakant Nevatia,et al.  Detection and Tracking of Multiple, Partially Occluded Humans by Bayesian Combination of Edgelet based Part Detectors , 2007, International Journal of Computer Vision.

[8]  Konrad Schindler,et al.  Continuous Energy Minimization for Multitarget Tracking , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Shihong Lao,et al.  Multi-object tracking through occlusions by local tracklets filtering and global tracklets association with detection responses , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Ram Nevatia,et al.  Learning to associate: HybridBoosted multi-target tracker for crowded scene , 2009, CVPR.

[12]  Ramakant Nevatia,et al.  Robust Object Tracking by Hierarchical Association of Detection Responses , 2008, ECCV.

[13]  Ramakant Nevatia,et al.  An online learned CRF model for multi-target tracking , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Huchuan Lu,et al.  Visual tracking via adaptive structural local sparse appearance model , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Luc Van Gool,et al.  Non-parametric motion-priors for flow understanding , 2012, 2012 IEEE Workshop on the Applications of Computer Vision (WACV).

[16]  Ramakant Nevatia,et al.  Multi-target tracking by online learning of non-linear motion patterns and robust appearance models , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  S. Savarese,et al.  Learning an Image-Based Motion Context for Multiple People Tracking , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Gérard G. Medioni,et al.  Tracking Using Motion Patterns for Very Crowded Scenes , 2012, ECCV.

[19]  Andrew V. Goldberg,et al.  On Implementing the Push—Relabel Method for the Maximum Flow Problem , 1997, Algorithmica.

[20]  Ian D. Reid,et al.  Stable multi-target tracking in real-time surveillance video , 2011, CVPR 2011.

[21]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[22]  Konrad Schindler,et al.  Discrete-continuous optimization for multi-target tracking , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Luc Van Gool,et al.  You'll never walk alone: Modeling social behavior for multi-target tracking , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[25]  Rui Caseiro,et al.  High-Speed Tracking with Kernelized Correlation Filters , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Yi Wu,et al.  Online Object Tracking: A Benchmark , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Ramakant Nevatia,et al.  Global data association for multi-object tracking using network flows , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Luc Van Gool,et al.  What's going on? Discovering spatio-temporal dependencies in dynamic scenes , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[29]  Martin Lauer,et al.  3D Traffic Scene Understanding From Movable Platforms , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Jiri Matas,et al.  Forward-Backward Error: Automatic Detection of Tracking Failures , 2010, 2010 20th International Conference on Pattern Recognition.

[31]  Wenhan Luo,et al.  Multiple Object Tracking: A Review , 2014, ArXiv.

[32]  Ming-Hsuan Yang,et al.  Bayesian Multi-object Tracking Using Motion Context from Multiple Objects , 2015, 2015 IEEE Winter Conference on Applications of Computer Vision.

[33]  Ramakant Nevatia,et al.  How does person identity recognition help multi-person tracking? , 2011, CVPR 2011.

[34]  Ramakant Nevatia,et al.  Beyond Pedestrians: A Hybrid Approach of Tracking Multiple Articulating Humans , 2015, 2015 IEEE Winter Conference on Applications of Computer Vision.

[35]  Rainer Stiefelhagen,et al.  Evaluating Multiple Object Tracking Performance: The CLEAR MOT Metrics , 2008, EURASIP J. Image Video Process..

[36]  Jesús Martínez del Rincón,et al.  Enhancing Linear Programming with Motion Modeling for Multi-target Tracking , 2015, 2015 IEEE Winter Conference on Applications of Computer Vision.

[37]  Pascal Fua,et al.  Tracking multiple people under global appearance constraints , 2011, 2011 International Conference on Computer Vision.

[38]  Ioannis A. Kakadiaris,et al.  To Track or To Detect? An Ensemble Framework for Optimal Selection , 2012, ECCV.

[39]  Ian D. Reid,et al.  Joint tracking and segmentation of multiple targets , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Luc Van Gool,et al.  Pedestrian detection at 100 frames per second , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[41]  Charless C. Fowlkes,et al.  Globally-optimal greedy algorithms for tracking a variable number of objects , 2011, CVPR 2011.

[42]  Gang Wang,et al.  Tracklet Association with Online Target-Specific Metric Learning , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[43]  Stefan Roth,et al.  MOTChallenge 2015: Towards a Benchmark for Multi-Target Tracking , 2015, ArXiv.