Learning Appearance Based Models: Hierarchical Mixtures of Experts Approach

This paper describes a new technique for object recognition based on learning appearance models. The image is decomposed into local regions which are described by a new texture representation derived from the output of multiscale, multiorientation filter banks. We call this representation ``Generalized Second Moments'''' as it can be viewed as a generalization of the windowed second moment matrix representation used by Garding & Lindeberg. Class-characteristic local texture features and their global composition is learned by a hierarchical mixture of experts architecture. The technique is applied to a vehicle database consisting of 5 general car categories (Sedan, Van with back-doors, Van without back-doors, old Sedan, and Volkswagen Bug). This is a difficult problem with considerable in-class variation. Our technique has a 6.5 % misclassification rate, compared to eigen-images which give 17.4 % misclassification rate, and nearest neighbors which give 15.7 % misclassification rate.

[1]  Alex Pentland,et al.  View-based and modular eigenspaces for face recognition , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Tomaso Poggio,et al.  Example Based Image Analysis and Synthesis , 1993 .

[3]  S. Nayar,et al.  Learning and Recognition of 3-D Objects from Brightness Images * , 1993 .

[4]  Timothy F. Cootes,et al.  Automatic interpretation of human faces and hand gestures using flexible models. , 1995 .

[5]  P Perona,et al.  Preattentive texture discrimination with early vision mechanisms. , 1990, Journal of the Optical Society of America. A, Optics and image science.

[6]  Jitendra Malik,et al.  Detecting and localizing edges composed of steps, peaks and roofs , 1990, [1990] Proceedings Third International Conference on Computer Vision.

[7]  Jitendra Malik,et al.  A Computational Framework for Determining Stereo Correspondence from a Set of Linear Spatial Filters , 1991, ECCV.

[8]  Jitendra Malik,et al.  Towards realtime visual based tracking in cluttered traffic scenes , 1994, Proceedings of the Intelligent Vehicles '94 Symposium.

[9]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[10]  Shree K. Nayar,et al.  Automatic generation of GRBF networks for visual learning , 1995, Proceedings of IEEE International Conference on Computer Vision.

[11]  Dean A. Pomerleau,et al.  Neural Network Perception for Mobile Robot Guidance , 1993 .

[12]  Michael C. Burl,et al.  Finding faces in cluttered scenes using random labeled graph matching , 1995, Proceedings of IEEE International Conference on Computer Vision.

[13]  Yann LeCun,et al.  Efficient Pattern Recognition Using a New Transformation Distance , 1992, NIPS.

[14]  Alex Pentland,et al.  Modal Matching for Correspondence and Recognition , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  M. Turk,et al.  Eigenfaces for Recognition , 1991, Journal of Cognitive Neuroscience.

[16]  Arthur R. Pope,et al.  Learning 3D Object Recognition Models from 2D Images , 1993 .

[17]  Roberto Brunelli,et al.  Face Recognition: Features Versus Templates , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[18]  Yann LeCun,et al.  Off Line Recognition of Handwritten Postal Words Using Neural Networks , 1993, Int. J. Pattern Recognit. Artif. Intell..

[19]  Edward H. Adelson,et al.  The Design and Use of Steerable Filters , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.