Human Tracking Using Convolutional Neural Networks

In this paper, we treat tracking as a learning problem of estimating the location and the scale of an object given its previous location, scale, as well as current and previous image frames. Given a set of examples, we train convolutional neural networks (CNNs) to perform the above estimation task. Different from other learning methods, the CNNs learn both spatial and temporal features jointly from image pairs of two adjacent frames. We introduce multiple path ways in CNN to better fuse local and global information. A creative shift-variant CNN architecture is designed so as to alleviate the drift problem when the distracting objects are similar to the target in cluttered environment. Furthermore, we employ CNNs to estimate the scale through the accurate localization of some key points. These techniques are object-independent so that the proposed method can be applied to track other types of object. The capability of the tracker of handling complex situations is demonstrated in many testing sequences.

[1]  D. Hubel,et al.  Receptive fields, binocular interaction and functional architecture in the cat's visual cortex , 1962, The Journal of physiology.

[2]  Yann LeCun,et al.  Multi-Digit Recognition Using a Space Displacement Neural Network , 1991, NIPS.

[3]  John C. Platt,et al.  A Convolutional Neural Network Hand Tracker , 1994, NIPS.

[4]  Ah Chung Tsoi,et al.  Face recognition: a convolutional neural-network approach , 1997, IEEE Trans. Neural Networks.

[5]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[6]  Dorin Comaniciu,et al.  Real-time tracking of non-rigid objects using mean shift , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[7]  Dorin Comaniciu,et al.  The Variable Bandwidth Mean Shift and Data-Driven Scale Selection , 2001, ICCV.

[8]  David J. Fleet,et al.  Robust Online Appearance Models for Visual Tracking , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Robert T. Collins,et al.  Mean-shift blob tracking through scale space , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[10]  R. Collins,et al.  On-line selection of discriminative tracking features , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[11]  James J. Little,et al.  A Boosted Particle Filter: Multitarget Detection and Tracking , 2004, ECCV.

[12]  Shai Avidan,et al.  Support vector tracking , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Ramakant Nevatia,et al.  Tracking multiple humans in complex situations , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  David J. Kriegman,et al.  Visual tracking using learned linear subspaces , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[15]  Horst Bischof,et al.  On-line Boosting and Vision , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[16]  Ramakant Nevatia,et al.  Detection and Tracking of Multiple, Partially Occluded Humans by Bayesian Combination of Edgelet based Part Detectors , 2007, International Journal of Computer Vision.

[17]  Ying Wu,et al.  A field model for human detection and tracking , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Yuan Li,et al.  Tracking in Low Frame Rate Video: A Cascade Particle Filter with Discriminative Observers of Different Lifespans , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Ming-Hsuan Yang,et al.  Incremental Learning for Robust Visual Tracking , 2008, International Journal of Computer Vision.

[20]  Luc Van Gool,et al.  Coupled Detection and Trajectory Estimation for Multi-Object Tracking , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[21]  Shai Avidan,et al.  Ensemble Tracking , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  D. Forsyth,et al.  Tracking People by Learning Their Appearance , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Ming Yang,et al.  A bi-subspace model for robust visual tracking , 2008, 2008 15th IEEE International Conference on Image Processing.

[24]  Yihong Gong,et al.  Training Hierarchical Feed-Forward Visual Recognition Models Using Transfer Learning from Pseudo-Tasks , 2008, ECCV.

[25]  Horst Bischof,et al.  Semi-supervised On-Line Boosting for Robust Tracking , 2008, ECCV.

[26]  Luc Van Gool,et al.  A mobile vision system for robust multi-person tracking , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Stefan Roth,et al.  People-tracking-by-detection and people-detection-by-tracking , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Ramakant Nevatia,et al.  Robust Object Tracking by Hierarchical Association of Detection Responses , 2008, ECCV.

[29]  Fatih Murat Porikli,et al.  Learning on lie groups for invariant detection and tracking , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Ying Wu,et al.  Contextual flow , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Serge J. Belongie,et al.  Visual tracking with online Multiple Instance Learning , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Ming Yang,et al.  Tracking Nonstationary Visual Appearances by Data-Driven Adaptation , 2009, IEEE Transactions on Image Processing.

[33]  Tobi Delbrück,et al.  CAVIAR: A 45k Neuron, 5M Synapse, 12G Connects/s AER Hardware Sensory–Processing– Learning–Actuating System for High-Speed Visual Object Recognition and Tracking , 2009, IEEE Transactions on Neural Networks.