Context Dependent Encoding Using Convolutional Dynamic Networks

Perception of sensory signals is strongly influenced by their context, both in space and time. In this paper, we propose a novel hierarchical model, called convolutional dynamic networks, that effectively utilizes this contextual information, while inferring the representations of the visual inputs. We build this model based on a predictive coding framework and use the idea of empirical priors to incorporate recurrent and top-down connections. These connections endow the model with contextual information coming from temporal as well as abstract knowledge from higher layers. To perform inference efficiently in this hierarchical model, we rely on a novel scheme based on a smoothing proximal gradient method. When trained on unlabeled video sequences, the model learns a hierarchy of stable attractors, representing low-level to high-level parts of the objects. We demonstrate that the model effectively utilizes contextual information to produce robust and stable representations for object recognition in video sequences, even in case of highly corrupted inputs.

[1]  Jean Ponce,et al.  Learning mid-level features for recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[2]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[3]  Rama Chellappa,et al.  Dictionary-Based Face Recognition from Video , 2012, ECCV.

[4]  Misha Denil,et al.  Recklessly Approximate Sparse Coding , 2012, ArXiv.

[5]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[6]  P. Dayan,et al.  Space and time in visual context , 2007, Nature Reviews Neuroscience.

[7]  Ruiping Wang,et al.  Manifold Discriminant Analysis , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Vincent Lepetit,et al.  Are sparse representations really relevant for image classification? , 2011, CVPR 2011.

[9]  Yurii Nesterov,et al.  Smooth minimization of non-smooth functions , 2005, Math. Program..

[10]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[11]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[12]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[13]  Vladimir Pavlovic,et al.  Face tracking and recognition with visual constraints in real-world videos , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Aapo Hyvärinen,et al.  A two-layer sparse coding model learns simple and complex cell receptive fields and topography from natural images , 2001, Vision Research.

[15]  Terrence J. Sejnowski,et al.  Slow Feature Analysis: Unsupervised Learning of Invariances , 2002, Neural Computation.

[16]  Shenghuo Zhu,et al.  Deep Learning of Invariant Features via Simulated Fixations in Video , 2012, NIPS.

[17]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[18]  Conrad Sanderson,et al.  Biometric Person Recognition: Face, Speech and Fusion , 2008 .

[19]  Paul A. Viola,et al.  Robust Real-time Object Detection , 2001 .

[20]  Yann LeCun,et al.  What is the best multi-stage architecture for object recognition? , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[21]  Karl J. Friston,et al.  A theory of cortical responses , 2005, Philosophical Transactions of the Royal Society B: Biological Sciences.

[22]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[23]  Larry S. Davis,et al.  Covariance discriminative learning: A natural and efficient approach to image set classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Sameer A. Nene,et al.  Columbia Object Image Library (COIL100) , 1996 .

[25]  Tai Sing Lee,et al.  Hierarchical Bayesian inference in the visual cortex. , 2003, Journal of the Optical Society of America. A, Optics, image science, and vision.

[26]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[27]  José Carlos Príncipe,et al.  Deep Predictive Coding Networks , 2013, ICLR.

[28]  D. George,et al.  A hierarchical Bayesian model of invariant pattern recognition in the visual cortex , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[29]  Ajmal S. Mian,et al.  Sparse approximated nearest points for image set classification , 2011, CVPR 2011.

[30]  Honglak Lee,et al.  Sparse deep belief net model for visual area V2 , 2007, NIPS.

[31]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[32]  David J. Kriegman,et al.  Visual tracking and recognition using probabilistic appearance manifolds , 2005, Comput. Vis. Image Underst..

[33]  D. J. Felleman,et al.  Distributed hierarchical processing in the primate cerebral cortex. , 1991, Cerebral cortex.

[34]  Graham W. Taylor,et al.  Deconvolutional networks , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[35]  Michael S. Lewicki,et al.  A Hierarchical Bayesian Model for Learning Nonlinear Statistical Regularities in Nonstationary Natural Signals , 2005, Neural Computation.

[36]  Y-Lan Boureau,et al.  Learning Convolutional Feature Hierarchies for Visual Recognition , 2010, NIPS.

[37]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[38]  Graham W. Taylor,et al.  Adaptive deconvolutional networks for mid and high level feature learning , 2011, 2011 International Conference on Computer Vision.

[39]  Rajat Raina,et al.  Self-taught learning: transfer learning from unlabeled data , 2007, ICML '07.

[40]  Geoffrey E. Hinton,et al.  An Efficient Learning Procedure for Deep Boltzmann Machines , 2012, Neural Computation.

[41]  Karl J. Friston Hierarchical Models in the Brain , 2008, PLoS Comput. Biol..

[42]  C. Gilbert,et al.  Top-down influences on visual processing , 2013, Nature Reviews Neuroscience.

[43]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[44]  Rama Chellappa,et al.  Kernel Learning for Extrinsic Classification of Manifold Features , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[45]  Bruno A. Olshausen,et al.  Learning Transformational Invariants from Natural Movies , 2008, NIPS.

[46]  Rajesh P. N. Rao,et al.  Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects. , 1999 .

[47]  Heiko Wersing,et al.  Learning Optimized Features for Hierarchical Models of Invariant Object Recognition , 2003, Neural Computation.

[48]  Karl J. Friston,et al.  Recognizing Sequences of Sequences , 2009, PLoS Comput. Biol..

[49]  Remo Guidieri Res , 1995, RES: Anthropology and Aesthetics.

[50]  Karl J. Friston,et al.  Cortical circuits for perceptual inference , 2009, Neural Networks.

[51]  José Carlos Príncipe,et al.  A fast proximal method for convolutional sparse coding , 2013, The 2013 International Joint Conference on Neural Networks (IJCNN).

[52]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[53]  Honglak Lee,et al.  Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.

[54]  David B. Dunson,et al.  The Hierarchical Beta Process for Convolutional Factor Analysis and Deep Learning , 2011, ICML.

[55]  Rajesh P. N. Rao,et al.  Dynamic Model of Visual Recognition Predicts Neural Response Properties in the Visual Cortex , 1997, Neural Computation.

[56]  Quoc V. Le,et al.  Measuring Invariances in Deep Networks , 2009, NIPS.

[57]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[58]  M. Bar Visual objects in context , 2004, Nature Reviews Neuroscience.

[59]  Karl J. Friston,et al.  A Hierarchy of Time-Scales and the Brain , 2008, PLoS Comput. Biol..

[60]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[61]  Hossein Mobahi,et al.  Deep learning from temporal coherence in video , 2009, ICML '09.

[62]  Xi Chen,et al.  Smoothing proximal gradient method for general structured sparse regression , 2010, The Annals of Applied Statistics.

[63]  Hakan Cevikalp,et al.  Face recognition based on image sets , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.