Visual Recognition and Inference Using Dynamic Overcomplete Sparse Learning

Abstract We present a hierarchical architecture and learning algorithm for visual recognition and other visual inference tasks such as imagination, reconstruction of occluded images, and expectation-driven segmentation. Using properties of biological vision for guidance, we posit a stochastic generative world model and from it develop a simplified world model (SWM) based on a tractable variational approximation that is designed to enforce sparse coding. Recent developments in computational methods for learning overcomplete representations (Lewicki & Sejnowski, 2000; Teh, Welling, Osindero, & Hinton, 2003) suggest that overcompleteness can be useful for visual tasks, and we use an overcomplete dictionary learning algorithm (Kreutz-Delgado, et al., 2003) as a preprocessing stage to produce accurate, sparse codings of images. Inference is performed by constructing a dynamic multilayer network with feedforward, feedback, and lateral connections, which is trained to approximate the SWM. Learning is done with a variant of the back-propagation-through-time algorithm, which encourages convergence to desired states within a fixed number of iterations. Vision tasks require large networks, and to make learning efficient, we take advantage of the sparsity of each layer to update only a small subset of elements in a large weight matrix at each iteration. Experiments on a set of rotated objects demonstrate various types of visual inference and show that increasing the degree of overcompleteness improves recognition performance in difficult scenes with occluded objects in clutter.

[1]  F. Attneave Some informational aspects of visual perception. , 1954, Psychological review.

[2]  D. Hubel,et al.  Receptive fields of single neurones in the cat's striate cortex , 1959, The Journal of physiology.

[3]  D. Brook On the distinction between the conditional probability and the joint probability approaches in the specification of nearest-neighbour systems , 1964 .

[4]  J. S. Barlow The mindful brain: B.M. Edelman and V.B. Mountcastle (MIT Press, Cambridge, Mass., 1978, 100 p., U.S. $ 10.00) , 1979 .

[5]  J J Hopfield,et al.  Neural networks and physical systems with emergent collective computational abilities. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Kunihiko Fukushima,et al.  Neocognitron: A new algorithm for pattern recognition tolerant of deformations and shifts in position , 1982, Pattern Recognit..

[7]  Geoffrey E. Hinton,et al.  OPTIMAL PERCEPTUAL INFERENCE , 1983 .

[8]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Geoffrey E. Hinton,et al.  A Learning Algorithm for Boltzmann Machines , 1985, Cogn. Sci..

[10]  M. Alexander,et al.  Principles of Neural Science , 1981 .

[11]  Carsten Peterson,et al.  A Mean Field Theory Learning Algorithm for Neural Networks , 1987, Complex Syst..

[12]  M. Mézard,et al.  Spin Glass Theory and Beyond , 1987 .

[13]  Sompolinsky,et al.  Dynamics of spin systems with randomly asymmetric bonds: Ising spins and Glauber dynamics. , 1988, Physical review. A, General physics.

[14]  Jing Peng,et al.  An Efficient Gradient-Based Algorithm for On-Line Training of Recurrent Network Trajectories , 1990, Neural Computation.

[15]  Peter Földiák,et al.  Learning Invariance from Transformation Sequences , 1991, Neural Comput..

[16]  Anders Krogh,et al.  Introduction to the theory of neural computation , 1994, The advanced book program.

[17]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[18]  D. J. Felleman,et al.  Distributed hierarchical processing in the primate cerebral cortex. , 1991, Cerebral cortex.

[19]  Albrecht Rau,et al.  Statistical mechanics of neural networks , 1992 .

[20]  C. Galland The limitations of deterministic Boltzmann machine learning , 1993 .

[21]  David J. Field,et al.  What Is the Goal of Sensory Coding? , 1994, Neural Computation.

[22]  J W Belliveau,et al.  Borders of multiple visual areas in humans revealed by functional magnetic resonance imaging. , 1995, Science.

[23]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[24]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[25]  Denis Fize,et al.  Speed of processing in the human visual system , 1996, Nature.

[26]  David J. Field,et al.  Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.

[27]  S. Kosslyn,et al.  Neural Systems Shared by Visual Imagery and Visual Perception: A Positron Emission Tomography Study , 1997, NeuroImage.

[28]  Geoffrey E. Hinton,et al.  Generative models for discovering sparse distributed representations. , 1997, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[29]  Rajesh P. N. Rao,et al.  Dynamic Model of Visual Recognition Predicts Neural Response Properties in the Visual Cortex , 1997, Neural Computation.

[30]  Michael I. Jordan Learning in Graphical Models , 1999, NATO ASI Series.

[31]  Robert Hecht-Nielsen,et al.  A Theory of the Cerebral Cortex , 1998, ICONIP.

[32]  D. Mumford,et al.  The role of the primary visual cortex in higher level vision , 1998, Vision Research.

[33]  Rajesh P. N. Rao,et al.  An optimal estimation approach to visual perception and learning , 1999, Vision Research.

[34]  T. Poggio,et al.  Hierarchical models of object recognition in cortex , 1999, Nature Neuroscience.

[35]  David Barber,et al.  Gaussian Fields for Approximate Inference in Layered Sigmoid Belief Networks , 1999, NIPS.

[36]  J. Gill,et al.  Generalized Linear Models: A Unified Approach , 2000 .

[37]  H. Kappen,et al.  Mean field theory for asymmetric neural networks. , 2000, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[38]  J L Gallant,et al.  Sparse coding and decorrelation in primary visual cortex during natural vision. , 2000, Science.

[39]  Terrence J. Sejnowski,et al.  Learning Overcomplete Representations , 2000, Neural Computation.

[40]  C. Koch,et al.  Category-specific visual responses of single neurons in the human medial temporal lobe , 2000, Nature Neuroscience.

[41]  Yee Whye Teh,et al.  Rate-coded Restricted Boltzmann Machines for Face Recognition , 2000, NIPS.

[42]  Chandan Dasgupta,et al.  Retrieval Properties of a Hopfield Model with Random Asymmetric Interactions , 2000, Neural Computation.

[43]  Edmund T. Rolls,et al.  A Model of Invariant Object Recognition in the Visual System: Learning Rules, Activation Functions, Lateral Inhibition, and Information-Based Performance Measures , 2000, Neural Computation.

[44]  Joseph F. Murray,et al.  An improved FOCUSS-based learning algorithm for solving sparse linear inverse problems , 2001, Conference Record of Thirty-Fifth Asilomar Conference on Signals, Systems and Computers (Cat.No.01CH37256).

[45]  C. Stevens An evolutionary scaling law for the primate visual system and its basis in cortical function , 2001, Nature.

[46]  Steven Kay,et al.  Fundamentals Of Statistical Signal Processing , 2001 .

[47]  A. Hyvärinen,et al.  A multi-layer sparse coding network learns contour coding from natural images , 2002, Vision Research.

[48]  Joseph F. Murray,et al.  Dictionary Learning Algorithms for Sparse Representation , 2003, Neural Computation.

[49]  Tai Sing Lee,et al.  Hierarchical Bayesian inference in the visual cortex. , 2003, Journal of the Optical Society of America. A, Optics, image science, and vision.

[50]  Yee Whye Teh,et al.  Approximate inference in Boltzmann machines , 2003, Artif. Intell..

[51]  Y. Ejima,et al.  Interindividual and interspecies variations of the extrastriate visual cortex , 2003, Neuroreport.

[52]  Yee Whye Teh,et al.  Energy-Based Models for Sparse Overcomplete Representations , 2003, J. Mach. Learn. Res..

[53]  Paola Campadelli,et al.  Asymmetric Boltzmann machines , 2004, Biological Cybernetics.

[54]  S. Grossberg,et al.  Adaptive pattern classification and universal recoding: I. Parallel development and coding of neural feature detectors , 1976, Biological Cybernetics.

[55]  Stephen Grossberg,et al.  Adaptive pattern classification and universal recoding: II. Feedback, expectation, olfaction, illusions , 1976, Biological Cybernetics.

[56]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[57]  O. Johnson Information Theory And The Central Limit Theorem , 2004 .

[58]  David J. Field,et al.  What is the other 85% of V1 doing? , 2004 .

[59]  J. Hawkins,et al.  On Intelligence , 2004 .

[60]  Edward M. Callaway,et al.  Feedforward, feedback and inhibitory connections in primate visual cortex , 2004, Neural Networks.

[61]  C. Koch,et al.  Invariant visual representation by single neurons in the human brain , 2005, Nature.

[62]  Kunihiko Fukushima,et al.  Restoring partly occluded patterns: a neural network model , 2005, Neural Networks.

[63]  Antonio Torralba,et al.  Learning hierarchical models of scenes, objects, and parts , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[64]  Joseph F. Murray,et al.  Visual recognition, inference and coding using learned sparse overcomplete representations , 2005 .

[65]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[66]  Pietro Perona,et al.  Weakly Supervised Scale-Invariant Learning of Models for Visual Recognition , 2007, International Journal of Computer Vision.

[67]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[68]  T. Sejnowski,et al.  23 problems in systems neuroscience , 2006 .

[69]  Joseph F. Murray,et al.  Learning Sparse Overcomplete Codes for Images , 2006, J. VLSI Signal Process..

[70]  Sunita Sarawagi Learning with Graphical Models , 2008 .

[71]  LONDON: HER MAJESTY'S STATIONERY OFFICE , 2022 .