Minimizing Binding Errors Using Learned Conjunctive Features

We have studied some of the design trade-offs governing visual representations based on spatially invariant conjunctive feature detectors, with an emphasis on the susceptibility of such systems to false-positive recognition errorsMalsburg's classical binding problem. We begin by deriving an analytical model that makes explicit how recognition performance is affected by the number of objects that must be distinguished, the number of features included in the representation, the complexity of individual objects, and the clutter load, that is, the amount of visual material in the field of view in which multiple objects must be simultaneously recognized, independent of pose, and without explicit segmentation. Using the domain of text to model object recognition in cluttered scenes, we show that with corrections for the nonuniform probability and nonindependence of text features, the analytical model achieves good fits to measured recognition rates in simulations involving a wide range of clutter loads, word sizes, and feature counts. We then introduce a greedy algorithm for feature learning, derived from the analytical model, which grows a representation by choosing those conjunctive features that are most likely to distinguish objects from the cluttered backgrounds in which they are embedded. We show that the representations produced by this algorithm are compact, decorrelated, and heavily weighted toward features of low conjunctive order. Our results provide a more quantitative basis for understanding when spatially invariant conjunctive features can support unambiguous perception in multiobject scenes, and lead to several insights regarding the properties of visual representations optimized for specific recognition tasks.

[1]  W. Pitts,et al.  How we know universals; the perception of auditory and visual forms. , 1947, The Bulletin of mathematical biophysics.

[2]  H. Kucera,et al.  Computational analysis of present-day American English , 1967 .

[3]  D. Hubel,et al.  Receptive fields and functional architecture of monkey striate cortex , 1968, The Journal of physiology.

[4]  Wayne A. Wickelgran Context-sensitive coding, associative memory, and serial order in (speech) behavior. , 1969 .

[5]  R. Yin Looking at Upside-down Faces , 1969 .

[6]  M. Potter Short-term conceptual memory for pictures. , 1976, Journal of experimental psychology. Human learning and memory.

[7]  J. Szentágothai The Ferrier Lecture, 1977 The neuron network of the cerebral cortex: a functional interpretation , 1978, Proceedings of the Royal Society of London. Series B. Biological Sciences.

[8]  G. Shepherd The Synaptic Organization of the Brain , 1979 .

[9]  James L. McClelland,et al.  An interactive activation model of context effects in letter perception: I. An account of basic findings. , 1981 .

[10]  C. Gilbert Microcircuitry of the visual cortex. , 1983, Annual review of neuroscience.

[11]  Takayuki Ito,et al.  Neocognitron: A neural network model for a mechanism of visual pattern recognition , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[12]  Alan J. Gross,et al.  Self-Organizing Methods in Modeling , 1988 .

[13]  Geoffrey E. Hinton,et al.  TRAFFIC: Recognizing Objects Using Hierarchical Reference Frame Transformations , 1989, NIPS.

[14]  Christian Lebiere,et al.  The Cascade-Correlation Learning Architecture , 1989, NIPS.

[15]  D. Ts'o,et al.  Functional organization of primate visual cortex revealed by high resolution optical imaging. , 1990, Science.

[16]  Yann LeCun,et al.  Handwritten zip code recognition with multilayer networks , 1990, [1990] Proceedings. 10th International Conference on Pattern Recognition.

[17]  Michael C. Mozer,et al.  Perception of multiple objects - a connectionist approach , 1991, Neural network modeling and connectionism.

[18]  Michael C. Mozer,et al.  The perception of multiple objects , 1991 .

[19]  A. Johnston,et al.  Recognising Faces: Effects of Lighting Direction, Inversion, and Brightness Reversal , 1992, Perception.

[20]  I. Biederman,et al.  Dynamic binding in a neural network for shape recognition. , 1992, Psychological review.

[21]  D. Perrett,et al.  Time course of neural responses discriminating different views of the face and head. , 1992, Journal of neurophysiology.

[22]  Irving Biederman,et al.  Visual object recognition , 1993 .

[23]  Joachim M. Buhmann,et al.  Distortion Invariant Object Recognition in the Dynamic Link Architecture , 1993, IEEE Trans. Computers.

[24]  Eugene Charniak,et al.  Statistical language learning , 1997 .

[25]  Rakesh Mohan,et al.  Multidimensional Indexing for Recognizing Visual Shapes , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[26]  David I. Perrett,et al.  Modeling visual recognition from neurobiological constraints , 1994, Neural Networks.

[27]  Christoph von der Malsburg,et al.  The Correlation Theory of Brain Function , 1994 .

[28]  I. Biederman,et al.  Viewpoint-dependent mechanisms in visual object recognition: Reply to Tarr and Bülthoff (1995). , 1995 .

[29]  N. Logothetis,et al.  Psychophysical and physiological evidence for viewer-centered object representations in the primate. , 1995, Cerebral cortex.

[30]  Keiji Tanaka,et al.  Inferotemporal cortex and object vision. , 1996, Annual review of neuroscience.

[31]  Denis Fize,et al.  Speed of processing in the human visual system , 1996, Nature.

[32]  Bernt Schiele,et al.  Probabilistic object recognition using multidimensional receptive field histograms , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[33]  Bartlett W. Mel SEEMORE: Combining Color, Shape, and Texture Histogramming in a Neurally Inspired Approach to Visual Object Recognition , 1997, Neural Computation.

[34]  Peter Seitz,et al.  Robust classification of arbitrary object classes based on hierarchical spatial feature-matching , 1997, Machine Vision and Applications.

[35]  S Edelman,et al.  A model of visual recognition and categorization. , 1997, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[36]  E. Rolls,et al.  INVARIANT FACE AND OBJECT RECOGNITION IN THE VISUAL SYSTEM , 1997, Progress in Neurobiology.

[37]  P. Sterling The Synaptic Organization of the Brain , 1998 .

[38]  Bartlett W. Mel,et al.  Translation-Invariant Orientation Tuning in Visual “Complex” Cells Could Derive from Intradendritic Computations , 1998, The Journal of Neuroscience.

[39]  Michèle Fabre-Thorpe,et al.  Brain Areas Involved in Rapid Categorization of Natural Images: An Event-Related fMRI Study , 2000, NeuroImage.

[40]  M. Tarr,et al.  Visual Object Recognition , 1996, ISTCS.