Limitations of using bags of complex features: Hierarchical higher-order filters fail to capture spatial configurations

One common method of representing images is to reduce an image to a collection of features. Many simple features have been proposed, such as pixel intensities and wavelet responses, but these choices are fundamentally unsuitable for capturing the configural relations of objects and object parts, as spatial information associated with each feature is lost. Another recent strategy, known as “feature-hierarchy” modeling, involves the use of overlapping, redundant features. These features are obtained by processing an image across a hierarchy of units tuned to progressively more complex properties. An open question is whether such approaches produce data structures rich enough for implicitly capturing configural relations. We implemented three experiments and several computer simulations to address this issue. Our method involved the use of four classes of objects, each derived from the simple spatial relationships present in classic Vernier and bisection acuity tasks. All human observers achieved near perfect categorization performance after relatively few exposures to each stimulus class. This ability also transferred across several dimensions, including orientation and background context. By contrast, simulations on a featurehierarchy model revealed poor performance for this class of models. Furthermore, the moderate categorization accuracy achieved did not transfer across even the simplest of dimensions. These results indicate that this approach to image representation lacks a fundamental property necessary for encoding the spatial configurations of object parts.

[1]  Stephen Grossberg,et al.  A laminar cortical model of stereopsis and 3D surface perception: closure and da Vinci stereopsis. , 2004, Spatial vision.

[2]  Christoph von der Malsburg,et al.  The What and Why of Binding: Review The Modeler's Perspective , 1999 .

[3]  Bernt Schiele,et al.  Probabilistic object recognition using multidimensional receptive field histograms , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[4]  Thomas Serre,et al.  Realistic Modeling of Simple and Complex Cell Tuning in the HMAX Model, and Implications for Invariant Object Recognition in Cortex , 2004 .

[5]  Gregory Francis,et al.  Cortical dynamics of figure-ground segmentation: Shine-through , 2009, Vision Research.

[6]  Zhaoping Li V1 mechanisms and some figure-ground and border effects. , 2003, Journal of physiology, Paris.

[7]  A. Thielscher,et al.  Neural mechanisms of cortico–cortical interaction in texture boundary detection: a modeling approach , 2003, Neuroscience.

[8]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[9]  David G. Lowe,et al.  University of British Columbia. , 1945, Canadian Medical Association journal.

[10]  Heiko Wersing,et al.  Learning Optimized Features for Hierarchical Models of Invariant Object Recognition , 2003, Neural Computation.

[11]  Thomas Serre,et al.  Object recognition with features inspired by visual cortex , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[12]  Jitendra Malik,et al.  Representing and Recognizing the Visual Appearance of Materials using Three-dimensional Textons , 2001, International Journal of Computer Vision.

[13]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[14]  Kunihiko Fukushima,et al.  Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.

[15]  J. Hummel,et al.  Functional Interactions Affect Object Detection in Non-Scene Displays , 2004 .

[16]  D. Kahneman,et al.  The reviewing of object files: Object-specific integration of information , 1992, Cognitive Psychology.

[17]  D G Pelli,et al.  The VideoToolbox software for visual psychophysics: transforming numbers into movies. , 1997, Spatial vision.

[18]  J. Bishop,et al.  Things and Places : How the Mind Connects with the World , 2010 .

[19]  Y. Amit,et al.  An integrated network for invariant visual detection and recognition , 2003, Vision Research.

[20]  M. Bar,et al.  Cortical Mechanisms Specific to Explicit Visual Object Recognition , 2001, Neuron.

[21]  M. Fahle,et al.  No transfer of perceptual learning between similar stimuli in the same retinal position , 1996, Current Biology.

[22]  S. Klein,et al.  Hyperacuity thresholds of 1 sec: theoretical predictions and empirical validation. , 1985, Journal of the Optical Society of America. A, Optics and image science.

[23]  I. Biederman Recognition-by-components: a theory of human image understanding. , 1987, Psychological review.

[24]  Wulfram Gerstner,et al.  Modeling spatial and temporal aspects of visual backward masking. , 2008, Psychological review.

[25]  D H Brainard,et al.  The Psychophysics Toolbox. , 1997, Spatial vision.

[26]  T. Poggio,et al.  Hierarchical models of object recognition in cortex , 1999, Nature Neuroscience.

[27]  David D. Cox,et al.  Untangling invariant object recognition , 2007, Trends in Cognitive Sciences.

[28]  I. Biederman,et al.  Dynamic binding in a neural network for shape recognition. , 1992, Psychological review.

[29]  Michael H. Herzog,et al.  Testing dynamical models of vision , 2011, Vision Research.

[30]  R. von der Heydt,et al.  A neural model of figure-ground organization. , 2007, Journal of neurophysiology.

[31]  Irving Biederman,et al.  Adaptation to objects in the lateral occipital complex (LOC): Shape or semantics? , 2009, Vision Research.

[32]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[33]  P. Schiller,et al.  Quantitative studies of single-cell properties in monkey striate cortex. I. Spatiotemporal organization of receptive fields. , 1976, Journal of neurophysiology.

[34]  G. Logan Spatial attention and the apprehension of spatial relations. , 1994, Journal of experimental psychology. Human perception and performance.

[35]  Series Peggy Perceptual learning in visual hyperacuity: a reweighting model , 2009 .

[36]  John E Hummel,et al.  Familiar interacting object pairs are perceptually grouped. , 2006, Journal of experimental psychology. Human perception and performance.

[37]  David I. Perrett,et al.  Neurophysiology of shape processing , 1993, Image Vis. Comput..

[38]  J. Cowan,et al.  A mathematical theory of the functional dynamics of cortical and thalamic nervous tissue , 1973, Kybernetik.

[39]  R. E. Crist,et al.  Configuration specificity in bisection acuity , 2001, Vision Research.

[40]  D M Levi,et al.  Spatial alignment across gaps: contributions of orientation and spatial scale. , 1995, Journal of the Optical Society of America. A, Optics, image science, and vision.

[41]  Mitsuo Kawato,et al.  Task-specific disruption of perceptual learning. , 2005 .

[42]  Shimon Edelman,et al.  Models of Perceptual Learning in Vernier Hyperacuity , 1993, Neural Computation.

[43]  Irving Biederman,et al.  Where do objects become scenes? , 2010, Cerebral Cortex.

[44]  W. Geisler,et al.  Perceptual organization of two-dimensional patterns. , 2000, Psychological review.

[45]  O. Braddick Visual hyperacuity. , 1984, Nature.