An optimal estimation approach to visual perception and learning

How does the visual system learn an internal model of the external environment? How is this internal model used during visual perception? How are occlusions and background clutter so effortlessly discounted for when recognizing a familiar object? How is a particular object of interest attended to and recognized in the presence of other objects in the field of view? In this paper, we attempt to address these questions from the perspective of Bayesian optimal estimation theory. Using the concept of generative models and the statistical theory of Kalman filtering, we show how static and dynamic events occurring in the visual environment may be learned and recognized given only the input images. We also describe an extension of the Kalman filter model that can handle multiple objects in the field of view. The resulting robust Kalman filter model demonstrates how certain forms of attention can be viewed as an emergent property of the interaction between top-down expectations and bottom-up signals. Experimental results are provided to help demonstrate the ability of such a model to perform robust segmentation and recognition of objects and image sequences in the presence of occlusions and clutter.

[1]  Ernst D. Dickmanns,et al.  Recursive 3-D Road and Relative Ego-State Recognition , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[3]  William Grimson,et al.  Object recognition by computer - the role of geometric constraints , 1991 .

[4]  David J. Field,et al.  Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.

[5]  Joachim M. Buhmann,et al.  Size and distortion invariant object recognition by hierarchical graph matching , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[6]  R. E. Kalman,et al.  A New Approach to Linear Filtering and Prediction Problems , 2002 .

[7]  David J. Field,et al.  What Is the Goal of Sensory Coding? , 1994, Neural Computation.

[8]  D Mumford,et al.  On the computational architecture of the neocortex. II. The role of cortico-cortical loops. , 1992, Biological cybernetics.

[9]  Michael Isard,et al.  Contour Tracking by Stochastic Propagation of Conditional Density , 1996, ECCV.

[10]  Horace Barlow,et al.  What is the computational goal of the neocortex , 1994 .

[11]  Tony Lindeberg,et al.  Scale-Space Theory in Computer Vision , 1993, Lecture Notes in Computer Science.

[12]  M. Tarr,et al.  Mental rotation and orientation-dependence in shape recognition , 1989, Cognitive Psychology.

[13]  M. Turk,et al.  Eigenfaces for Recognition , 1991, Journal of Cognitive Neuroscience.

[14]  D. Mackay The Epistemological Problem for Automata , 1956 .

[15]  William T. Freeman,et al.  The generic viewpoint assumption in a framework for visual perception , 1994, Nature.

[16]  N. Logothetis,et al.  View-dependent object recognition by monkeys , 1994, Current Biology.

[17]  Teuvo Kohonen,et al.  Self-Organization and Associative Memory , 1988 .

[18]  A. Borst Seeing smells: imaging olfactory learning in bees , 1999, Nature Neuroscience.

[19]  T. Poggio,et al.  A network that learns to recognize three-dimensional objects , 1990, Nature.

[20]  I. Daubechies Ten Lectures on Wavelets , 1992 .

[21]  Dana H. Ballard,et al.  An introduction to natural computation , 2000, Complex adaptive systems.

[22]  Rajesh P. N. Rao,et al.  Efficient Encoding of Natural Time Varying Images Produces Oriented Space-Time Receptive Fields , 1997 .

[23]  R W Prager,et al.  Development of low entropy coding in a recurrent network. , 1996, Network.

[24]  S. Grossberg,et al.  Neural networks for vision and image processing , 1992 .

[25]  GrossbergS. Adaptive pattern classification and universal recoding , 1976 .

[26]  William Feller,et al.  An Introduction to Probability Theory and Its Applications , 1951 .

[27]  A. J. Collins,et al.  Introduction To Multivariate Analysis , 1981 .

[28]  D. Ts'o,et al.  Functional organization of primate visual cortex revealed by high resolution optical imaging. , 1990, Science.

[29]  David G. Lowe,et al.  Three-Dimensional Object Recognition from Single Two-Dimensional Images , 1987, Artif. Intell..

[30]  Bernt Schiele,et al.  Object Recognition Using Multidimensional Receptive Field Histograms , 1996, ECCV.

[31]  William R. Softky,et al.  Unsupervised Pixel-prediction , 1995, NIPS.

[32]  Joel L. Davis,et al.  Large-Scale Neuronal Theories of the Brain , 1994 .

[33]  Teuvo Kohonen,et al.  Self-Organization and Associative Memory, Second Edition , 1988, Springer Series in Information Sciences.

[34]  E Harth,et al.  The inversion of sensory processing by feedback pathways: a model of visual cognitive functions. , 1987, Science.

[35]  R. Andrew Hidden State and Reinforcement Learning with Instance-Based State Identification , 1996 .

[36]  Geoffrey E. Hinton,et al.  The "wake-sleep" algorithm for unsupervised neural networks. , 1995, Science.

[37]  D. J. Felleman,et al.  Distributed hierarchical processing in the primate cerebral cortex. , 1991, Cerebral cortex.

[38]  Roberto Brunelli,et al.  Face Recognition: Features Versus Templates , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[39]  John G. Daugman,et al.  Complete discrete 2-D Gabor transforms by neural networks for image analysis and compression , 1988, IEEE Trans. Acoust. Speech Signal Process..

[40]  Daniel Kersten,et al.  High-level Vision as Statistical Inference , 1999 .

[41]  George Henry Dunteman,et al.  Introduction To Multivariate Analysis , 1984 .

[42]  Leslie G. Ungerleider,et al.  Object vision and spatial vision: two cortical pathways , 1983, Trends in Neurosciences.

[43]  Olivier D. Faugeras,et al.  HYPER: A New Approach for the Recognition and Positioning of Two-Dimensional Objects , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  Rajesh P. N. Rao,et al.  Learning Lie Groups for Invariant Visual Perception , 1998, NIPS.

[45]  Tomaso Poggio,et al.  Models of object recognition , 2000, Nature Neuroscience.

[46]  Paul A. Viola Feature-Based Recognition of Objects , 1993 .

[47]  Paul A. Viola Complex Feature Recognition: A Bayesian Approach for Learning to Recognize Objects , 1996 .

[48]  Rajesh P. N. Rao,et al.  Development of localized oriented receptive fields by learning a translation-invariant code for natural images. , 1998, Network.

[49]  Cordelia Schmid,et al.  Combining greyvalue invariants with local constraints for object recognition , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[50]  Arthur E. Bryson,et al.  Applied Optimal Control , 1969 .

[51]  Donald T. Stuss,et al.  Neurobiology of conscious experience , 1994, Current Opinion in Neurobiology.

[52]  C. Collin,et al.  An Introduction to Natural Computation , 1998, Trends in Cognitive Sciences.

[53]  John Daugman,et al.  High Confidence Visual Recognition of Persons by a Test of Statistical Independence , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[54]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[55]  J. Guckenheimer Dynamic model , 1989, Nature.

[56]  Bartlett W. Mel SEEMORE: a view-based approach to 3-D object recognition using multiple visual cues , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[57]  Rajesh P. N. Rao,et al.  Dynamic Model of Visual Recognition Predicts Neural Response Properties in the Visual Cortex , 1997, Neural Computation.

[58]  Mitsuo Kawato,et al.  A forward-inverse optics model of reciprocal connections between visual cortical areas , 1993 .

[59]  Leslie G. Ungerleider Two cortical visual systems , 1982 .

[60]  Edward H. Adelson,et al.  Shiftable multiscale transforms , 1992, IEEE Trans. Inf. Theory.

[61]  Terrence J. Sejnowski,et al.  The “independent components” of natural scenes are edge filters , 1997, Vision Research.

[62]  Geoffrey E. Hinton,et al.  The Helmholtz Machine , 1995, Neural Computation.

[63]  John Hallam Resolving Observer Motion by Object Tracking , 1983, IJCAI.

[64]  D. Ferster Linearity of synaptic interactions in the assembly of receptive fields in cat visual cortex , 1994, Current Opinion in Neurobiology.

[65]  H H Bülthoff,et al.  How are three-dimensional objects represented in the brain? , 1994, Cerebral cortex.

[66]  R. Mansfield,et al.  Analysis of visual behavior , 1982 .

[67]  Wolfgang Hackbusch,et al.  Multi-grid methods and applications , 1985, Springer series in computational mathematics.

[68]  K. C. Chou,et al.  Multiscale recursive estimation, data fusion, and regularization , 1994, IEEE Trans. Autom. Control..

[69]  Hiroshi Murase,et al.  Learning and recognition of 3D objects from appearance , 1993, [1993] Proceedings IEEE Workshop on Qualitative Vision.

[70]  R. E. Kalman,et al.  New Results in Linear Filtering and Prediction Theory , 1961 .

[71]  Yehezkel Lamdan,et al.  Geometric Hashing: A General And Efficient Model-based Recognition Scheme , 1988, [1988 Proceedings] Second International Conference on Computer Vision.

[72]  James S. Albus,et al.  Outline for a theory of intelligence , 1991, IEEE Trans. Syst. Man Cybern..

[73]  Andrea Salgian,et al.  A cubist approach to object recognition , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[74]  Terrence J. Sejnowski,et al.  Learning Nonlinear Overcomplete Representations for Efficient Coding , 1997, NIPS.

[75]  A. Pece Redundancy reduction of a Gabor representation: a possible computational role for feedback from primary visual cortex to lateral geniculate nucleus , 1992 .

[76]  William Feller,et al.  An Introduction to Probability Theory and Its Applications , 1967 .

[77]  N. Humphrey A History of the Mind , 1992 .

[78]  John H. R. Maunsell,et al.  Hierarchical organization and functional streams in the visual cortex , 1983, Trends in Neurosciences.

[79]  H. Barlow Cerebral Cortex as Model Builder , 1987 .

[80]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[81]  Rama Chellappa,et al.  Estimation of Object Motion Parameters from Noisy Images , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[82]  Stefano Levialdi,et al.  Pyramidal Systems for Computer Vision , 2011, NATO ASI Series.

[83]  Ramesh C. Jain,et al.  Dynamic vision , 1988, [1988 Proceedings] 9th International Conference on Pattern Recognition.

[84]  J. G. Taylor,et al.  ARTIFICIAL NEURAL NETWORKS, 2 , 1992 .

[85]  Horst Bischof,et al.  Dealing with occlusions in the eigenspace approach , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[86]  Lonnie Chrisman,et al.  Reinforcement Learning with Perceptual Aliasing: The Perceptual Distinctions Approach , 1992, AAAI.

[87]  Rajesh P. N. Rao,et al.  An Active Vision Architecture Based on Iconic Representations , 1995, Artif. Intell..