Efficient Coding of Visual Scenes by Grouping and Segmentation: Theoretical Predictions and Biological Evidence

Efficient coding by scene segmentation Lee & Yuille Efficient coding of visual scenes by grouping and segmentation: theoretical predictions and biological evidence Tai Sing Lee & Alan L. Yuille Introduction The goal of this chapter is to present computational theories of scene coding by image segmentation and to suggest their relevance for understanding visual cortical function and mechanisms. We will first introduce computational theories of image and scene segmentation and show their relationship to efficient encoding. Then we discuss and evaluate the rel- evant physiological data in the context of these computational frameworks. It is hoped that this will stimulate quantitative neurophysiological investigations of scene segmentation guided by computational theories. Our central conjecture is that areas V1 and V2, in addition to encoding fine details of images in terms of filter responses, compute a segmentation of images which allow a more compact and parsimonious encoding of images in terms of the properties of regions and surfaces in the visual scene. This conjecture is based on the observation that neurons and their retinotopic arrangement in these visual areas can represent information precisely, thus furnishing an appropriate computational and representational infrastructure for this task. Segmentation detects and extracts coherent regions in an image and then encode the image in terms of probabilistic models of surfaces and regions in it, in the spirit of Shannon’s theory of information. This representation facilitates visual reasoning at the level of regions and their boundaries, without worrying too much about all the small details in the image. Figure 1 gives three examples which illustrate the meaning of higher level efficient encoding of scenes. Firstly, consider Kanizsa’s (1979) famous illusory triangle (Figure 1a). It is simpler to explain it as a white triangle in front of, and partially occluding, three black circular discs rather than as three pac-mens which are accidentally aligned to each other. Indeed this simple explanation is what human perceive and, in fact, the perception of a triangle is so strong that we hallucinate the surface of the triangle as being brighter than the background and perceive sharp boundaries to the triangle even at places where there is no direct visual cues. Secondly, when confronted with the image shown in Figure 1b (Ramachandran 1988), we perceive it as a group of convex spheres mixed together with a group of concave indentations (e.g. an egg carton partly filled with eggs). This interpretation is more parsimonious than describing every detail of the intensity shading and other image features. Thirdly, at first glance, the image in Figure 1c (Gregory 1970) appears to be a collection of random dots and hence would not have a simple encoding. But the encoding becomes greatly simplified once the viewer perceives the Dalmation dog and can invoke a dog model. The viewer will latch on to this interpretation whenever he sees it again, underscoring the powerful interaction between memory and perception when generating an efficient perceptual description. These three examples suggest that we can achieve a tremendous amount of data compression by interpreting images in terms of the structure of the scene. They suggest a succession of increasingly more compact and semantically more meaningful codes as we move up the visual hierarchy. These codes go beyond efficient coding of images based on Gabor wavelet responses (Daugman 1985, Lee 1996) or independent components (Olshausen and Field 1996, Bell and Sejnowski 1997, Lewicki and Olshausen 1999). In this chapter, we will concentrate on image segmentation which is the process that partitions an image into regions, producing a clear delineation of the boundaries between regions and the labelling of properties of the regions. The definition of “regions” is a flexible one. In this chapter, we focus on early visual processing and so a region is defined to be part of an image that is characterized by a set of (approximately) homogeneous visual cues, such as color, luminance, or texture. These regions can correspond to 3D surfaces in the visual scene, or they can be parts of a 3D surface defined by (approximately) constant texture, albedo, or color (e.g. the red letters “No Parking” on a white stop sign). Based on a single image, however, it is often difficult to distinguish between these two interpretations. At a higher level of vision, the definition of region is more complex and can involve hierarchical structures involving objects and scene structures. The approach we have taken stems from the following computational perspective about the function of the visual system. We hold it to be self-evident that the purpose of the visual system is to interpret input images in terms of objects MIT Press Page

[1]  F. H. Adler Cybernetics, or Control and Communication in the Animal and the Machine. , 1949 .

[2]  S. W. Kuffler Discharge patterns and functional organization of mammalian retina. , 1953, Journal of neurophysiology.

[3]  Vision Research , 1961, Nature.

[4]  R. Gregory The intelligent eye , 1970 .

[5]  B Julesz,et al.  Experiments in the visual perception of texture. , 1975, Scientific American.

[6]  L. Maffei,et al.  The unresponsive regions of visual cortical receptive fields , 1976, Vision Research.

[7]  D. Hubel,et al.  Ferrier lecture - Functional architecture of macaque monkey visual cortex , 1977, Proceedings of the Royal Society of London. Series B. Biological Sciences.

[8]  T. Wiesel,et al.  Functional architecture of macaque monkey visual cortex , 1977 .

[9]  J. Nelson,et al.  Orientation-selective inhibition from beyond the classic visual receptive field , 1978, Brain Research.

[10]  James L. McClelland,et al.  An interactive activation model of context effects in letter perception: I. An account of basic findings. , 1981 .

[11]  R. von der Heydt,et al.  Illusory contours and cortical neuron responses. , 1984, Science.

[12]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  S. Ullman Visual routines , 1984, Cognition.

[14]  J. Daugman Uncertainty relation for resolution in space, spatial frequency, and orientation optimized by two-dimensional visual cortical filters. , 1985, Journal of the Optical Society of America. A, Optics and image science.

[15]  S Grossberg,et al.  Neural dynamics of perceptual grouping: Textures, boundaries, and emergent segmentations , 1985, Perception & psychophysics.

[16]  R. Young GAUSSIAN DERIVATIVE THEORY OF SPATIAL VISION: ANALYSIS OF CORTICAL CELL RECEPTIVE FIELD LINE-WEIGHTING PROFILES. , 1985 .

[17]  Ennio Mingolla,et al.  Neural dynamics of perceptual grouping: Textures, boundaries, and emergent segmentations , 1985 .

[18]  R. Desimone,et al.  Selective attention gates visual processing in the extrastriate cortex. , 1985, Science.

[19]  T. Wiesel,et al.  Relationships between horizontal interactions and functional architecture in cat striate cortex as revealed by cross-correlation analysis , 1986, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[20]  C Koch,et al.  Analog "neuronal" networks in early vision. , 1986, Proceedings of the National Academy of Sciences of the United States of America.

[21]  Berthold K. P. Horn Robot vision , 1986, MIT electrical engineering and computer science series.

[22]  John F. Canny,et al.  A Computational Approach to Edge Detection , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Andrew Blake,et al.  Visual Reconstruction , 1987, Deep Learning for EEG-Based Brain–Computer Interfaces.

[24]  J. P. Jones,et al.  An evaluation of the two-dimensional Gabor filter model of simple receptive fields in cat striate cortex. , 1987, Journal of neurophysiology.

[25]  Stephen Grossberg,et al.  Competitive Learning: From Interactive Activation to Adaptive Resonance , 1987, Cogn. Sci..

[26]  John G. Daugman,et al.  Complete discrete 2-D Gabor transforms by neural networks for image analysis and compression , 1988, IEEE Trans. Acoust. Speech Signal Process..

[27]  A Treisman,et al.  Feature analysis in early vision: evidence from search asymmetries. , 1988, Psychological review.

[28]  Stuart German,et al.  Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images , 1988 .

[29]  V. S. Ramachandran,et al.  Perception of shape from shading , 1988, Nature.

[30]  James L. McClelland,et al.  An interactive activation model of context effects in letter perception: part 1.: an account of basic findings , 1988 .

[31]  R. von der Heydt,et al.  Mechanisms of contour perception in monkey visual cortex. I. Lines of pattern discontinuity , 1989, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[32]  R. von der Heydt,et al.  Mechanisms of contour perception in monkey visual cortex. II. Contours bridging gaps , 1989, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[33]  D. Mumford,et al.  Optimal approximations by piecewise smooth functions and associated variational problems , 1989 .

[34]  W. Singer,et al.  Oscillatory responses in cat visual cortex exhibit inter-columnar synchronization which reflects global stimulus properties , 1989, Nature.

[35]  D. V. van Essen,et al.  Neuronal responses to static texture patterns in area V1 of the alert macaque monkey. , 1992, Journal of neurophysiology.

[36]  Luigi Ambrosio,et al.  ON THE APPROXIMATION OF FREE DISCONTINUITY PROBLEMS , 1992 .

[37]  Tai Sing Lee,et al.  Texture Segmentation by Minimizing Vector-Valued Energy Functionals: The Coupled-Membrane Model , 1992, ECCV.

[38]  Joseph J. Atick,et al.  What Does the Retina Know about Natural Scenes? , 1992, Neural Computation.

[39]  D Mumford,et al.  On the computational architecture of the neocortex. II. The role of cortico-cortical loops. , 1992, Biological cybernetics.

[40]  Victor A. F. Lamme,et al.  Organization of texture segregation processing in primate visual cortex , 1993, Visual Neuroscience.

[41]  Michael J. Hawken,et al.  Macaque VI neurons can signal ‘illusory’ contours , 1993, Nature.

[42]  David Mumford,et al.  Filtering, Segmentation and Depth , 1993, Lecture Notes in Computer Science.

[43]  C. Li,et al.  Extensive integration field beyond the classical receptive field of cat's striate cortical neurons--classification and tuning properties. , 1994, Vision research.

[44]  R. Desimone,et al.  Parallel neuronal mechanisms for short-term memory. , 1994, Science.

[45]  Christoph von der Malsburg,et al.  The Correlation Theory of Brain Function , 1994 .

[46]  Terrence J. Sejnowski,et al.  An Information-Maximization Approach to Blind Separation and Blind Deconvolution , 1995, Neural Computation.

[47]  T. S. Lee A Bayesian framework for understanding texture segmentation in the primary visual cortex , 1995, Vision Research.

[48]  Lance R. Williams,et al.  Stochastic Completion Fields: A Neural Model of Illusory Contour Shape and Salience , 1995, Neural Computation.

[49]  R. Desimone,et al.  Neural mechanisms of selective visual attention. , 1995, Annual review of neuroscience.

[50]  Gerhard Winkler,et al.  Image analysis, random fields and dynamic Monte Carlo methods: a mathematical introduction , 1995, Applications of mathematics.

[51]  Victor A. F. Lamme The neurophysiology of figure-ground segregation in primary visual cortex , 1995, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[52]  W Singer,et al.  Visual feature integration and the temporal correlation hypothesis. , 1995, Annual review of neuroscience.

[53]  J H Maunsell,et al.  The Brain's Visual World: Representation of Visual Targets in Cerebral Cortex , 1995, Science.

[54]  Tai Sing Lee,et al.  Image Representation Using 2D Gabor Wavelets , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[55]  Victor A. F. Lamme,et al.  Contextual Modulation in Primary Visual Cortex , 1996, The Journal of Neuroscience.

[56]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[57]  R C Reid,et al.  Efficient Coding of Natural Scenes in the Lateral Geniculate Nucleus: Experimental Test of a Computational Theory , 1996, The Journal of Neuroscience.

[58]  A. Yuille,et al.  A Theoretical Framework for Visual Motion , 1996 .

[59]  M. Sur,et al.  Orientation Maps of Subjective Contours in Visual Cortex , 1996, Science.

[60]  Lance R. Williams,et al.  Stochastic Completion Fields: A Neural Model of Illusory Contour Shape and Salience , 1997, Neural Computation.

[61]  R. Desimone,et al.  Neural mechanisms of spatial selective attention in areas V1, V2, and V4 of macaque visual cortex. , 1997, Journal of neurophysiology.

[62]  Song-Chun Zhu,et al.  Prior Learning and Gibbs Reaction-Diffusion , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[63]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[64]  Terrence J. Sejnowski,et al.  The “independent components” of natural scenes are edge filters , 1997, Vision Research.

[65]  Victor A. F. Lamme,et al.  Neuronal synchrony does not represent texture segregation , 1998, Nature.

[66]  Sean P. MacEvoy,et al.  Integration of surface information in primary visual cortex , 1998, Nature Neuroscience.

[67]  Victor A. F. Lamme,et al.  Figure-ground activity in primary visual cortex is suppressed by anesthesia. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[68]  Sir G. Archaeopteryx Object-based attention in the primary visual cortex of the macaque monkey , 1998 .

[69]  D. Mumford,et al.  The role of the primary visual cortex in higher level vision , 1998, Vision Research.

[70]  Frances S. Chance,et al.  Synaptic Depression and the Temporal Response Characteristics of V1 Cells , 1998, The Journal of Neuroscience.

[71]  Andrea Braides Approximation of Free-Discontinuity Problems , 1998 .

[72]  J. M. Hupé,et al.  Cortical feedback improves discrimination between figure and background by V1, V2 and V3 neurons , 1998, Nature.

[73]  Song-Chun Zhu,et al.  Stochastic Jump-Diffusion Process for Computing Medial Axes in Markov Random Fields , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[74]  Wolf Singer,et al.  Neuronal Synchrony: A Versatile Code for the Definition of Relations? , 1999, Neuron.

[75]  M. Paradiso,et al.  Neural Correlates of Perceived Brightness in the Retina, Lateral Geniculate Nucleus, and Striate Cortex , 1999, The Journal of Neuroscience.

[76]  Rajesh P. N. Rao,et al.  Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects. , 1999 .

[77]  Bruno A. Olshausen,et al.  PROBABILISTIC FRAMEWORK FOR THE ADAPTATION AND COMPARISON OF IMAGE CODES , 1999 .

[78]  Michael N. Shadlen,et al.  Synchrony Unbound A Critical Evaluation of the Temporal Binding Hypothesis , 1999, Neuron.

[79]  R. von der Heydt,et al.  Coding of Border Ownership in Monkey Visual Cortex , 2000, The Journal of Neuroscience.

[80]  Rainer Goebel,et al.  Neural synchrony correlates with surface segregation rules , 2000, Nature.

[81]  C. Gilbert,et al.  Spatial distribution of contextual interactions in primary visual cortex and in visual perception. , 2000, Journal of neurophysiology.

[82]  J. Bakin,et al.  Visual Responses in Monkey Areas V1 and V2 to Three-Dimensional Surface Configurations , 2000, The Journal of Neuroscience.

[83]  Tai Sing Lee,et al.  Informatics of spike trains in neuronal ensemble , 2000, Proceedings of the 22nd Annual International Conference of the IEEE Engineering in Medicine and Biology Society (Cat. No.00CH37143).

[84]  T. S. Lee,et al.  Dynamics of subjective contour formation in the early visual cortex. , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[85]  C. Hung,et al.  Building surfaces from borders in Areas 17 and 18 of the cat , 2001, Vision Research.

[86]  Leslie G. Ungerleider,et al.  Contextual Modulation in Primary Visual Cortex of Macaques , 2001, The Journal of Neuroscience.

[87]  Adrienne L. Fairhall,et al.  Efficiency and ambiguity in an adaptive neural code , 2001, Nature.

[88]  Harry Shum,et al.  Image segmentation by data driven Markov chain Monte Carlo , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[89]  Jitendra Malik,et al.  A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[90]  Ronen Basri,et al.  Segmentation and boundary detection using multiscale intensity measurements , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[91]  Refractor Vision , 2000, The Lancet.

[92]  W. Singer,et al.  Rapid feature selective neuronal synchronization through correlated latency shifting , 2001, Nature Neuroscience.

[93]  Sean P. MacEvoy,et al.  Perception of Brightness and Brightness Illusions in the Macaque Monkey , 2002, The Journal of Neuroscience.

[94]  J. B. Levitt,et al.  Circuits for Local and Global Signal Integration in Primary Visual Cortex , 2002, The Journal of Neuroscience.

[95]  Shimon Ullman,et al.  Class-Specific, Top-Down Segmentation , 2002, ECCV.

[96]  J. B. Levitt,et al.  Anatomical origins of the classical receptive field and modulatory surround field of single neurons in macaque visual cortical area V1. , 2002, Progress in brain research.

[97]  D. V. van Essen,et al.  Scene segmentation and attention in primate cortical areas V1 and V2. , 2002, Journal of neurophysiology.

[98]  A. Parker,et al.  A specialization for relative disparity in V2 , 2002, Nature Neuroscience.

[99]  Eero P. Simoncelli Vision and the statistics of the visual environment , 2003, Current Opinion in Neurobiology.

[100]  Tai Sing Lee,et al.  Hierarchical Bayesian inference in the visual cortex. , 2003, Journal of the Optical Society of America. A, Optics, image science, and vision.

[101]  L. Pessoa,et al.  Filling-in: From perceptual completion to cortical reorganization. , 2003 .

[102]  Adrian Barbu,et al.  Graph partition by Swendsen-Wang cuts , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[103]  Jianbo Shi,et al.  Multiclass spectral clustering , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[104]  Yvan G. Leclerc,et al.  Constructing simple stable descriptions for image partitioning , 1989, International Journal of Computer Vision.

[105]  T. Lee,et al.  The role of early visual cortex in visual integration: a neural model of recurrent interaction , 2004, The European journal of neuroscience.

[106]  Jianbo Shi,et al.  Segmentation given partial grouping constraints , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[107]  Victor A. F. Lamme,et al.  Synchrony and covariation of firing rates in the primary visual cortex during contour grouping , 2004, Nature Neuroscience.

[108]  Song-Chun Zhu,et al.  What are Textons? , 2005, International Journal of Computer Vision.

[109]  D. Mumford On the computational architecture of the neocortex , 2004, Biological Cybernetics.

[110]  R. Eckhorn,et al.  Coherent oscillations: A mechanism of feature linking in the visual cortex? , 1988, Biological Cybernetics.

[111]  Peter N. Belhumeur,et al.  A Bayesian approach to binocular steropsis , 1996, International Journal of Computer Vision.

[112]  Alan L. Yuille,et al.  A common framework for image segmentation , 1990, International Journal of Computer Vision.

[113]  Nicole C. Rust,et al.  Do We Know What the Early Visual System Does? , 2005, The Journal of Neuroscience.

[114]  G. DeAngelis,et al.  Does Neuronal Synchrony Underlie Visual Feature Grouping? , 2005, Neuron.

[115]  Robert T. Collins,et al.  Corrected Laplacians: closer cuts and segmentation with shape priors , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[116]  Zhuowen Tu,et al.  Image Parsing: Unifying Segmentation, Detection, and Recognition , 2005, International Journal of Computer Vision.

[117]  A. B. Bonds,et al.  Gamma oscillation maintains stimulus structure-dependent synchronization in cat visual cortex. , 2005, Journal of neurophysiology.

[118]  Gary L. Miller,et al.  Graph Partitioning by Spectral Rounding: Applications in Image Segmentation and Clustering , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[119]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[120]  P. Grünwald The Minimum Description Length Principle (Adaptive Computation and Machine Learning) , 2007 .