Visual grouping and object recognition

We develop a two-stage framework for parsing and understanding images, a process of image segmentation grouping pixels to form regions of coherent color and texture, and a process of recognition - comparing assemblies of such regions, hypothesized to correspond to a single object, with views of stored prototypes. We treat segmenting images into regions as an optimization problem: partition the image into regions such that there is high similarity within a region and low similarity across regions. This is formalized as the minimization of the normalized cut between regions. Using ideas from spectral graph theory, the minimization can be set as an eigenvalue problem. Visual attributes such as color, texture, contour and motion are encoded in this framework by suitable specification of graph edge weights. The recognition problem requires us to compare assemblies of image regions with previously stored proto-typical views of known objects. We have devised a novel algorithm for shape matching based on a relationship descriptor called the shape context. This enables us to compute similarity measures between shapes which, together with similarity measures for texture and color, can be used for object recognition. The shape matching algorithm has yielded excellent results on a variety of different 2D and 3D recognition problems.

[1]  M. Wertheimer Laws of organization in perceptual forms. , 1938 .

[2]  E. Adelson,et al.  Early vision and texture perception , 1988, Nature.

[3]  Jitendra Malik,et al.  Contour Continuity in Region Based Image Segmentation , 1998, ECCV.

[4]  Jitendra Malik,et al.  Shape Context: A New Descriptor for Shape Matching and Object Recognition , 2000, NIPS.

[5]  Robyn A. Owens,et al.  Feature detection from local energy , 1987, Pattern Recognit. Lett..

[6]  J. Thompson,et al.  MOVPE growth for the fabrication of OEICs , 1992 .

[7]  D'arcy W. Thompson On growth and form i , 1943 .

[8]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[9]  P. O. Bishop,et al.  Spatial vision. , 1971, Annual review of psychology.

[10]  J. G. Snodgrass,et al.  A standardized set of 260 pictures: norms for name agreement, image agreement, familiarity, and visual complexity. , 1980, Journal of experimental psychology. Human learning and memory.

[11]  Joachim M. Buhmann,et al.  Non-parametric similarity measures for unsupervised texture segmentation and image retrieval , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[12]  Martin A. Fischler,et al.  The Representation and Matching of Pictorial Structures , 1973, IEEE Transactions on Computers.

[13]  Ulf Grenander,et al.  Hands: A Pattern Theoretic Study of Biological Shapes , 1990 .

[14]  Jitendra Malik,et al.  Matching Shapes , 2001, ICCV.

[15]  P Perona,et al.  Preattentive texture discrimination with early vision mechanisms. , 1990, Journal of the Optical Society of America. A, Optics and image science.

[16]  Jitendra Malik,et al.  Detecting and localizing edges composed of steps, peaks and roofs , 1990, [1990] Proceedings Third International Conference on Computer Vision.

[17]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[18]  Grace Wahba,et al.  Spline Models for Observational Data , 1990 .