Simulating vision through time: Hierarchical, sparse models of visual cortex for motion imagery

Efficient pattern recognition in motion imagery has become a growing challenge as the number of video sources proliferates worldwide. Historically, automated analysis of motion imagery, such as object detection, classification and tracking, has been accomplished using hand-designed feature detectors. Though useful, these feature detectors are not easily extended to new data sets or new target categories since they are often task specific, and typically require substantial effort to design. Rather than hand-designing filters, recent advances in the field of image processing have resulted in a theoretical framework of sparse, hierarchical, learned representations that can describe video data of natural scenes at many spatial and temporal scales and many levels of object complexity. These sparse, hierarchical models learn the information content of imagery and video from the data itself and lead to state-of-the-art performance and more efficient processing. Processing efficiency is important as it allows scaling up of research to work with dataset sizes and numbers of categories approaching real-world conditions. We now describe recent work at Los Alamos National Laboratory developing hierarchical sparse learning computer vision models that can process high definition color video in real time. We present preliminary results extending our prior work on object classification in still imagery [1] to discovery of useful features at different time scales in motion imagery for detection, classification and tracking of objects.

[1]  Yi Ma,et al.  Robust principal component analysis? , 2009, JACM.

[2]  Namrata Vaswani,et al.  Recursive sparse recovery in large but correlated noise , 2011, 2011 49th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[3]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[4]  Thomas Serre,et al.  Automated home-cage behavioural phenotyping of mice. , 2010, Nature communications.

[5]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[6]  Thomas Serre,et al.  A feedforward architecture accounts for rapid categorization , 2007, Proceedings of the National Academy of Sciences.

[7]  Rick Chartrand,et al.  Nonconvex Regularization for Shape Preservation , 2007, 2007 IEEE International Conference on Image Processing.

[8]  Fei-Fei Li,et al.  What Does Classifying More Than 10, 000 Image Categories Tell Us? , 2010, ECCV.

[9]  Guillermo Sapiro,et al.  Are You Imitating Me? Unsupervised Sparse Modeling for Group Activity Analysis from a Single Video , 2012, ArXiv.

[10]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[11]  Brendt Wohlberg,et al.  Inpainting by Joint Optimization of Linear Combinations of Exemplars , 2011, IEEE Signal Processing Letters.

[12]  Garrett T. Kenyon,et al.  Large-scale functional models of visual cortex for remote sensing , 2009, 2009 IEEE Applied Imagery Pattern Recognition Workshop (AIPR 2009).

[13]  Garrett T. Kenyon,et al.  Combining multiple visual processing streams for locating and classifying objects in video , 2012, 2012 IEEE Southwest Symposium on Image Analysis and Interpretation.

[14]  Eric Feron,et al.  Trajectory Clustering and an Application to Airspace Monitoring , 2010, IEEE Transactions on Intelligent Transportation Systems.

[15]  Kazuhiro Otsuka,et al.  Real-time Visual Tracker by Stream Processing , 2009, J. Signal Process. Syst..

[16]  Rick Chartrand,et al.  Nonconvex Splitting for Regularized Low-Rank + Sparse Decomposition , 2012, IEEE Transactions on Signal Processing.

[17]  Giuseppe Durisi,et al.  Real-time principal component pursuit , 2011, 2011 Conference Record of the Forty Fifth Asilomar Conference on Signals, Systems and Computers (ASILOMAR).

[18]  R. Chartrand,et al.  Restricted isometry properties and nonconvex compressive sensing , 2007 .

[19]  Mohan M. Trivedi,et al.  Trajectory Learning for Activity Understanding: Unsupervised, Multilevel, and Long-Term Adaptive Approach , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[21]  Tom Minka,et al.  Principled Hybrids of Generative and Discriminative Models , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[22]  Pietro Perona,et al.  Social behavior recognition in continuous video , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Yongmin Li,et al.  On incremental and robust subspace learning , 2004, Pattern Recognit..

[24]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[25]  Michael Elad,et al.  Stable recovery of sparse overcomplete representations in the presence of noise , 2006, IEEE Transactions on Information Theory.

[26]  D. Grier,et al.  Methods of Digital Video Microscopy for Colloidal Studies , 1996 .

[27]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[28]  Namrata Vaswani,et al.  Real-time Robust Principal Components' Pursuit , 2010, 2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[29]  Rick Chartrand,et al.  Fast algorithms for nonconvex compressive sensing: MRI reconstruction from very few data , 2009, 2009 IEEE International Symposium on Biomedical Imaging: From Nano to Macro.

[30]  Kevin Skadron,et al.  Parallelization of particle filter algorithms , 2010, ISCA'10.

[31]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[32]  Laura Balzano,et al.  Incremental gradient on the Grassmannian for online foreground and background separation in subsampled video , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Rayan Saab,et al.  Stable sparse approximations via nonconvex optimization , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[34]  Dacheng Tao,et al.  GoDec: Randomized Lowrank & Sparse Matrix Decomposition in Noisy Case , 2011, ICML.

[35]  G. Giannakis,et al.  Sparsity control for robust principal component analysis , 2010, 2010 Conference Record of the Forty Fourth Asilomar Conference on Signals, Systems and Computers.

[36]  James Theiler,et al.  Local principal component pursuit for nonlinear datasets , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[37]  John Wright,et al.  Robust Principal Component Analysis: Exact Recovery of Corrupted Low-Rank Matrices via Convex Optimization , 2009, NIPS.

[38]  Ling Liu,et al.  NEAT: Road Network Aware Trajectory Clustering , 2012, 2012 IEEE 32nd International Conference on Distributed Computing Systems.

[39]  Guillermo Sapiro,et al.  Sparse Representation for Computer Vision and Pattern Recognition , 2010, Proceedings of the IEEE.

[40]  David W. Capson,et al.  A Framework for 3D Model-Based Visual Tracking Using a GPU-Accelerated Particle Filter , 2012, IEEE Transactions on Visualization and Computer Graphics.

[41]  Michael Elad,et al.  Optimally sparse representation in general (nonorthogonal) dictionaries via ℓ1 minimization , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[42]  Rick Chartrand,et al.  Exact Reconstruction of Sparse Signals via Nonconvex Minimization , 2007, IEEE Signal Processing Letters.

[43]  Jean Ponce,et al.  Task-Driven Dictionary Learning , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[45]  Zhixun Su,et al.  Solving Principal Component Pursuit in Linear Time via $l_1$ Filtering , 2011, ArXiv.

[46]  Emmanuel J. Candès,et al.  Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information , 2004, IEEE Transactions on Information Theory.

[47]  Wotao Yin,et al.  Iteratively reweighted algorithms for compressive sensing , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.