Learning to detect objects in images via a sparse, part-based representation

We study the problem of detecting objects in still, gray-scale images. Our primary focus is the development of a learning-based approach to the problem that makes use of a sparse, part-based representation. A vocabulary of distinctive object parts is automatically constructed from a set of sample images of the object class of interest; images are then represented using parts from this vocabulary, together with spatial relations observed among the parts. Based on this representation, a learning algorithm is used to automatically learn to detect instances of the object class in new images. The approach can be applied to any object with distinguishable parts in a relatively fixed spatial configuration; it is evaluated here on difficult sets of real-world images containing side views of cars, and is seen to successfully detect objects in varying conditions amidst background clutter and mild occlusion. In evaluating object detection approaches, several important methodological issues arise that have not been satisfactorily addressed in the previous work. A secondary focus of this paper is to highlight these issues, and to develop rigorous evaluation standards for the object detection problem. A critical evaluation of our approach under the proposed standards is presented.

[1]  Hans P. Morevec Towards automatic visual obstacle avoidance , 1977, IJCAI 1977.

[2]  Hans P. Moravec Towards Automatic Visual Obstacle Avoidance , 1977, IJCAI.

[3]  S. Palmer Hierarchical structure in perceptual representation , 1977, Cognitive Psychology.

[4]  I. Biederman Recognition-by-components: a theory of human image understanding. , 1987, Psychological review.

[5]  N. Littlestone Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[6]  M. Turk,et al.  Eigenfaces for Recognition , 1991, Journal of Cognitive Neuroscience.

[7]  Linda G. Shapiro,et al.  Computer and Robot Vision , 1991 .

[8]  D. Perrett,et al.  Recognition of objects and their component parts: responses of single units in the temporal cortex of the macaque. , 1994, Cerebral cortex.

[9]  Alex Pentland,et al.  Probabilistic visual learning for object detection , 1995, Proceedings of IEEE International Conference on Computer Vision.

[10]  S. Ullman High-Level Vision: Object Recognition and Visual Cognition , 1996 .

[11]  Cordelia Schmid,et al.  Local Grayvalue Invariants for Image Retrieval , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Federico Girosi,et al.  Training support vector machines: an application to face detection , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[13]  Thomas S. Huang,et al.  Face detection with information-based maximum discrimination , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[14]  Takeo Kanade,et al.  Neural Network-Based Face Detection , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Dan Roth,et al.  Learning to Resolve Natural Language Ambiguities: A Unified Approach , 1998, AAAI/IAAI.

[16]  Narendra Ahuja,et al.  A SNoW-Based Face Detector , 1999, NIPS.

[17]  Yali Amit,et al.  A Computational Model for Visual Selection , 1999, Neural Computation.

[18]  Yoshua Bengio,et al.  Object Recognition with Gradient-Based Learning , 1999, Shape, Contour and Grouping in Computer Vision.

[19]  Pietro Perona,et al.  Unsupervised Learning of Models for Recognition , 2000, ECCV.

[20]  Takeo Kanade,et al.  A statistical method for 3D object detection applied to faces and cars , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[21]  Michel Vidal-Naquet,et al.  A Fragment-Based Approach to Object Representation and Classification , 2001, IWVF.

[22]  Tomaso A. Poggio,et al.  Example-Based Object Detection in Images by Components , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[23]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[24]  Dan Roth,et al.  Learning a Sparse Representation for Object Detection , 2002, ECCV.

[25]  Narendra Ahuja,et al.  Learning to Recognize Three-Dimensional Objects , 2002, Neural Computation.

[26]  Michel Vidal-Naquet,et al.  Visual features of intermediate complexity and their use in classification , 2002, Nature Neuroscience.

[27]  M. Tarr,et al.  Visual Object Recognition , 1996, ISTCS.

[28]  Tomaso A. Poggio,et al.  A Trainable System for Object Detection , 2000, International Journal of Computer Vision.