Recognition of Visual Object Classes

Object recognition is both about recognizing speciic objects, e.g., \That is my dog Spot." and about recognizing classes of objects, e.g., \That is a dog." Our focus is on the latter problem, even though we do not ooer a precise deenition for what constitutes a class. In some cases, for example with human faces, the objects in a class are visually similar and form a visual object class. In other cases, say chairs, objects in the class may not look at all alike|the only similarities are in function. Recognition of functional object classes requires higher-level cognitive reasoning, we restrict here our attention to visual object classes. The main diiculty in object recognition is the problem of invariance. The pixel representation provided by the camera is dependent upon the lighting conditions, object pose, camera position, etc. Further, there is inherent variability between diierent instances from the same object class. Our approach to this problem is to model an object class as a set of local parts arranged in a deformable spatial connguration. For example, human faces consist of parts such as the eyes, nose, and mouth whose positions in the image plane vary with pose and expression as well as from individual to individual. The allowed object deformations are represented through shape statistics, which are learned from examples. Instances of an object in an image are detected by nding the appropriate features in the correct spatial connguration. The algorithm is robust with respect to partial occlusion, detector false alarms, and missed features. A 94% success rate is demonstrated for the problem of locating quasi-frontal views of faces in cluttered scenes.

[1]  Takeo Kanade,et al.  Human Face Detection in Visual Scenes , 1995, NIPS.

[2]  F. Bookstein The Morphometric Synthesis for landmarks and edge- elements in images , 1995 .

[3]  Michael C. Burl,et al.  Finding faces in cluttered scenes using random labeled graph matching , 1995, Proceedings of IEEE International Conference on Computer Vision.

[4]  Timothy F. Cootes,et al.  Combining point distribution models with shape models based on finite element analysis , 1994, Image Vis. Comput..

[5]  William T. Freeman,et al.  Orientation Histograms for Hand Gesture Recognition , 1995 .

[6]  Timothy F. Cootes,et al.  Automatic interpretation of human faces and hand gestures using flexible models. , 1995 .

[7]  Hans Knutsson,et al.  Signal processing for computer vision , 1994 .

[8]  Arthur R. Pope,et al.  Modeling Positional Uncertainty in Object Recognition , 1994 .

[9]  Pietro Perona,et al.  Rotation invariant texture recognition using a steerable pyramid , 1994, Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3 - Conference C: Signal Processing (Cat. No.94CH3440-5).

[10]  Gilles Burel,et al.  Detection and localization of faces on digital images , 1994, Pattern Recognit. Lett..

[11]  Timothy F. Cootes,et al.  Use of active shape models for locating structures in medical images , 1994, Image Vis. Comput..

[12]  Pietro Perona,et al.  Automating the hunt for volcanoes on Venus , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[13]  William T. Freeman,et al.  Television control by hand gestures , 1994 .

[14]  Roberto Brunelli,et al.  Face Recognition: Features Versus Templates , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Christoph von der Malsburg,et al.  A Neural System for the Recognition of Partially Occluded Objects in Cluttered Scenes: A Pilot Study , 1993, Int. J. Pattern Recognit. Artif. Intell..

[16]  K. Mardia,et al.  Multivariate Aspects of Shape Theory , 1993 .

[17]  Joachim M. Buhmann,et al.  Distortion Invariant Object Recognition in the Dynamic Link Architecture , 1993, IEEE Trans. Computers.

[18]  Jitendra Malik,et al.  A Computational Framework for Determining Stereo Correspondence from a Set of Linear Spatial Filters , 1991, ECCV.

[19]  K. Mardia,et al.  General shape distributions in a plane , 1991, Advances in Applied Probability.

[20]  M. Bichsel Strategies of robust object recognition for the automatic identification of human faces , 1991 .

[21]  A. Yuille Deformable Templates for Face Recognition , 1991, Journal of Cognitive Neuroscience.

[22]  D. W. Lewis Matrix theory , 1991 .

[23]  M. Turk,et al.  Eigenfaces for Recognition , 1991, Journal of Cognitive Neuroscience.

[24]  P Perona,et al.  Preattentive texture discrimination with early vision mechanisms. , 1990, Journal of the Optical Society of America. A, Optics and image science.

[25]  D. Kendall A Survey of the Statistical Theory of Shape , 1989 .

[26]  F. Bookstein Size and Shape Spaces for Landmark Data in Two Dimensions , 1986 .

[27]  Andrew P. Witkin,et al.  Analyzing Oriented Patterns , 1985, IJCAI.

[28]  F. Bookstein A statistical method for biological shape comparisons. , 1984, Journal of theoretical biology.

[29]  R. Redner,et al.  Mixture densities, maximum likelihood, and the EM algorithm , 1984 .

[30]  D. Kendall SHAPE MANIFOLDS, PROCRUSTEAN METRICS, AND COMPLEX PROJECTIVE SPACES , 1984 .

[31]  Hans Knutsson,et al.  Texture Analysis Using Two-Dimensional Quadrature Filters , 1983 .

[32]  G. Granlund In search of a general picture processing operator , 1978 .

[33]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[34]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.