A large margin framework for single camera offline tracking with hybrid cues

We introduce MMTrack (max-margin tracker), a single-target tracker that linearly combines constant and adaptive appearance features. We frame offline single-camera tracking as a structured output prediction task where the goal is to find a sequence of locations of the target given a video. Following recent advances in machine learning, we discriminatively learn tracker parameters by first generating suitable bad trajectories and then employing a margin criterion to learn how to distinguish among ground truth trajectories and all other possibilities. Our framework for tracking is general, and can be used with a variety of features. We demonstrate a system combining a variety of appearance features and a motion model, with the parameters of these features learned jointly in a coherent learning framework. Further, taking advantage of a reliable human detector, we present a natural way of extending our tracker to a robust detection and tracking system. We apply our framework to pedestrian tracking and experimentally demonstrate the effectiveness of our method on two real-world data sets, achieving results comparable to state-of-the-art tracking systems.

[1]  Konrad Schindler,et al.  Globally Optimal Multi-target Tracking on a Hexagonal Lattice , 2010, ECCV.

[2]  Trevor Darrell,et al.  Integrated Person Tracking Using Stereo, Color, and Pattern Detection , 2000, International Journal of Computer Vision.

[3]  Dariu Gavrila,et al.  A Bayesian Framework for Multi-cue 3D Object Tracking , 2004, ECCV.

[4]  Gary Bradski,et al.  Computer Vision Face Tracking For Use in a Perceptual User Interface , 1998 .

[5]  Bernt Schiele,et al.  Towards Robust Multi-cue Integration for Visual Tracking , 2001, ICVS.

[6]  Ming-Hsuan Yang,et al.  Visual tracking with histograms and articulating blocks , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Ehud Rivlin,et al.  A probabilistic framework for combining tracking algorithms , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[8]  Mathias Kölsch,et al.  Fast 2D Hand Tracking with Flocks of Features and Multi-Cue Integration , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[9]  Xin Li,et al.  Contour-based object tracking with occlusion handling in video acquired using mobile cameras , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Jing Zhang,et al.  Framework for Performance Evaluation of Face, Text, and Vehicle Detection and Tracking in Video: Data, Metrics, and Protocol , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Jihun Park,et al.  Accurate object contour tracking based on boundary edge selection , 2007, Pattern Recognit..

[12]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[13]  Justus H. Piater,et al.  A Probabilistic Approach to Integrating Multiple Cues in Visual Tracking , 2008, ECCV.

[14]  Paul A. Viola,et al.  Robust Real-time Object Detection , 2001 .

[15]  Trevor Darrell,et al.  Conditional Random People: Tracking Humans with CRFs and Grid Filters , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[16]  Roberto Cipolla,et al.  Learning to track with multiple observers , 2009, CVPR.

[17]  Yanxi Liu,et al.  Online Selection of Discriminative Tracking Features , 2005, IEEE Trans. Pattern Anal. Mach. Intell..

[18]  T. Sayed,et al.  Automated Collection of Pedestrian Data Using Computer Vision Techniques , 2009 .

[19]  Fatih Murat Porikli,et al.  Integral histogram: a fast way to extract histograms in Cartesian spaces , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[20]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[21]  Luc Van Gool,et al.  Online Multiperson Tracking-by-Detection from a Single, Uncalibrated Camera , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Ming-Hsuan Yang,et al.  Visual tracking with online Multiple Instance Learning , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Thomas Hofmann,et al.  Support vector machine learning for interdependent and structured output spaces , 2004, ICML.

[24]  Dimitrios Makris,et al.  Performance evaluation of object tracking algorithms , 2007 .

[25]  Francesc Moreno-Noguer,et al.  Dependent Multiple Cue Integration for Robust Tracking , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  ZhangJing,et al.  Framework for Performance Evaluation of Face, Text, and Vehicle Detection and Tracking in Video , 2009 .

[27]  Ming-Hsuan Yang,et al.  Incremental Learning for Robust Visual Tracking , 2008, International Journal of Computer Vision.

[28]  Trevor Darrell,et al.  Integrated Person Tracking Using Stereo, Color, and Pattern Detection , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[29]  Ze-Nian Li,et al.  Max-Margin Offline Pedestrian Tracking with Multiple Cues , 2010, 2010 Canadian Conference on Computer and Robot Vision.

[30]  Pascal Fua,et al.  Robust People Tracking with Global Trajectory Optimization , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[31]  James W. Davis,et al.  A perceptual user interface for recognizing head gesture acknowledgements , 2001, PUI '01.

[32]  Ying Wu,et al.  Robust Visual Tracking by Integrating Multiple Cues Based on Co-Inference Learning , 2004, International Journal of Computer Vision.

[33]  Dorin Comaniciu,et al.  Kernel-Based Object Tracking , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[34]  David A. Forsyth,et al.  Building models of animals from video , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Carlo Tomasi,et al.  Good features to track , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.