Track Everything: Limiting Prior Knowledge in Online Multi-Object Recognition

This paper addresses the problem of online tracking and classification of multiple objects in an image sequence. Our proposed solution is to first track all objects in the scene without relying on object-specific prior knowledge, which in other systems can take the form of hand-crafted features or user-based track initialization. We then classify the tracked objects with a fast-learning image classifier, that is based on a shallow convolutional neural network architecture and demonstrate that object recognition improves when this is combined with object state information from the tracking algorithm. We argue that by transferring the use of prior knowledge from the detection and tracking stages to the classification stage, we can design a robust, general purpose object recognition system with the ability to detect and track a variety of object types. We describe our biologically inspired implementation, which adaptively learns the shape and motion of tracked objects, and apply it to the Neovision2 Tower benchmark data set, which contains multiple object types. An experimental evaluation demonstrates that our approach is competitive with the state-of-the-art video object recognition systems that do make use of object-specific prior knowledge in detection and tracking, while providing additional practical advantages by virtue of its generality.

[1]  Andrew Zisserman,et al.  Texture classification: are filter banks necessary? , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[2]  David A. Kearney,et al.  A Competitive Attentional Approach to Mitigating Model Drift in Adaptive Visual Tracking , 2014, IVCNZ '14.

[3]  Subhransu Maji,et al.  Deep filter banks for texture recognition and segmentation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Rainer Stiefelhagen,et al.  Evaluating Multiple Object Tracking Performance: The CLEAR MOT Metrics , 2008, EURASIP J. Image Video Process..

[5]  Ming-Hsuan Yang,et al.  Hierarchical Convolutional Features for Visual Tracking , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[6]  Zdenek Kalal,et al.  Tracking-Learning-Detection , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  David J. Field,et al.  Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.

[8]  Mark D. McDonnell,et al.  Fast, Simple and Accurate Handwritten Digit Classification by Training Shallow Neural Network Classifiers with the ‘Extreme Learning Machine’ Algorithm , 2015, PloS one.

[9]  Nuno Vasconcelos,et al.  Biologically Inspired Object Tracking Using Center-Surround Saliency Mechanisms , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Sebastien C. Wong,et al.  Combining online feature selection with adaptive shape estimation , 2010, 2010 25th International Conference of Image and Vision Computing New Zealand.

[11]  Yanxi Liu,et al.  Online selection of discriminative tracking features , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Nuno Vasconcelos Feature selection by maximum marginal diversity: optimality and implications for visual recognition , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[13]  Deepak Khosla,et al.  A neuromorphic system for video object recognition , 2014, Front. Comput. Neurosci..

[14]  Guillaume-Alexandre Bilodeau,et al.  Urban Tracker: Multiple object tracking in urban mixed traffic , 2014, IEEE Winter Conference on Applications of Computer Vision.

[15]  Ognjen Arandjelovic,et al.  Contextually Learnt Detection of Unusual Motion-Based Behaviour in Crowded Public Spaces , 2013, ISCIS.

[16]  Sebastien Wong,et al.  Advanced correlation tracking of objects in cluttered imagery , 2005 .

[17]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[18]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[19]  Stefan Roth,et al.  MOT16: A Benchmark for Multi-Object Tracking , 2016, ArXiv.

[20]  Jae-Soo Cho,et al.  Selective-Attention Correlation Measure for Precision Video Tracking , 2005, IEICE Trans. Inf. Syst..

[21]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[22]  Mark D. McDonnell,et al.  Understanding Data Augmentation for Classification: When to Warp? , 2016, 2016 International Conference on Digital Image Computing: Techniques and Applications (DICTA).

[23]  Dorin Comaniciu,et al.  Kernel-Based Object Tracking , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[24]  Xiang Zhang,et al.  OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[25]  Andrew S. Cassidy,et al.  A million spiking-neuron integrated circuit with a scalable communication network and interface , 2014, Science.

[26]  Margrit Betke,et al.  Online Motion Agreement Tracking , 2013, BMVC.

[27]  J. Munkres ALGORITHMS FOR THE ASSIGNMENT AND TRANSIORTATION tROBLEMS* , 1957 .

[28]  P. Goldman-Rakic,et al.  Dissociation of object and spatial processing domains in primate prefrontal cortex. , 1993, Science.

[29]  A. Hillas Cerenkov light images of EAS produced by primary gamma , 1985 .

[30]  Ales Leonardis,et al.  Robust Visual Tracking Using an Adaptive Coupled-Layer Visual Model , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Mark D. McDonnell,et al.  Learned filters for object detection in multi-object visual tracking , 2016, SPIE Defense + Security.

[32]  Deepak Khosla,et al.  Performance Evaluation of Neuromorphic-Vision Object Recognition Algorithms , 2014, 2014 22nd International Conference on Pattern Recognition.

[33]  Ognjen Arandjelovic,et al.  Multiple-object Tracking in Cluttered and Crowded Public Spaces , 2010, ISVC.

[34]  Ivan Lee,et al.  Mutual information for enhanced feature selection in visual tracking , 2015, Defense + Security Symposium.

[35]  Tobi Delbrück,et al.  Retinomorphic Event-Based Vision Sensors: Bioinspired Cameras With Spiking Output , 2014, Proceedings of the IEEE.

[36]  Mark D. McDonnell,et al.  Enhanced image classification with a fast-learning shallow convolutional neural network , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[37]  Sebastien Wong,et al.  Relating image, shape, position, and velocity in visual tracking , 2009, Defense + Commercial Sensing.

[38]  Jun Miao,et al.  Constrained Extreme Learning Machine: A novel highly discriminative random feedforward neural network , 2014, 2014 International Joint Conference on Neural Networks (IJCNN).

[39]  Stefan Roth,et al.  MOTChallenge 2015: Towards a Benchmark for Multi-Target Tracking , 2015, ArXiv.

[40]  Malcolm J. A. Strens,et al.  Representation of uncertainty in spatial target tracking , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[41]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[42]  S C Rao,et al.  Integration of what and where in the primate prefrontal cortex. , 1997, Science.

[43]  Michael Felsberg,et al.  The Visual Object Tracking VOT2013 Challenge Results , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[44]  Deepak Khosla,et al.  Spiking Deep Convolutional Neural Networks for Energy-Efficient Object Recognition , 2014, International Journal of Computer Vision.

[45]  Konrad Schindler,et al.  Discrete-continuous optimization for multi-target tracking , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[46]  Takahiro Ishikawa,et al.  The template update problem , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[47]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[48]  Honglak Lee,et al.  Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.

[49]  Luc Van Gool,et al.  Online Multiperson Tracking-by-Detection from a Single, Uncalibrated Camera , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[50]  Y. LeCun,et al.  Learning methods for generic object recognition with invariance to pose and lighting , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[51]  Robert T. Collins,et al.  Moving Object Localization in Thermal Imagery by Forward-backward MHI , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[52]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[53]  Ian N. Gregory,et al.  Tracking in cluttered images , 2003, Image Vis. Comput..

[54]  Alexei A. Efros,et al.  Unbiased look at dataset bias , 2011, CVPR 2011.

[55]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[56]  Michael Felsberg,et al.  Convolutional Features for Correlation Filter Based Visual Tracking , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[57]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[58]  Kuk-Jin Yoon,et al.  Robust Online Multi-object Tracking Based on Tracklet Confidence and Online Discriminative Appearance Learning , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[59]  Garrett T. Kenyon,et al.  Combining multiple visual processing streams for locating and classifying objects in video , 2012, 2012 IEEE Southwest Symposium on Image Analysis and Interpretation.

[60]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[61]  S. Grossberg,et al.  Adaptive pattern classification and universal recoding: I. Parallel development and coding of neural feature detectors , 1976, Biological Cybernetics.