Learning to Detect Objects in Images via a Sparse, Part-Based Representation

We study the problem of detecting objects in still, gray-scale images. Our primary focus is the development of a learning- based approach to the problem that makes use of a sparse, part-based representation. A vocabulary of distinctive object parts is automatically constructed from a set of sample images of the object class of interest; images are then represented using parts from this vocabulary, together with spatial relations observed among the parts. Based on this representation, a learning algorithm is used to automatically learn to detect instances of the object class in new images. The approach can be applied to any object with distinguishable parts in a relatively fixed spatial configuration; it is evaluated here on difficult sets of real-world images containing side views of cars, and is seen to successfully detect objects in varying conditions amidst background clutter and mild occlusion. In evaluating object detection approaches, several important methodological issues arise that have not been satisfactorily addressed in previous work. A secondary focus of this paper is to highlight these issues and to develop rigorous evaluation standards for the object detection problem. A critical evaluation of our approach under the proposed standards is presented.

[1]  Michel Vidal-Naquet,et al.  Visual features of intermediate complexity and their use in classification , 2002, Nature Neuroscience.

[2]  Tomaso A. Poggio,et al.  Example-Based Object Detection in Images by Components , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Linda G. Shapiro,et al.  Computer and Robot Vision (Volume II) , 2002 .

[4]  Tomaso A. Poggio,et al.  A Trainable System for Object Detection , 2000, International Journal of Computer Vision.

[5]  Federico Girosi,et al.  Training support vector machines: an application to face detection , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[6]  M. Turk,et al.  Eigenfaces for Recognition , 1991, Journal of Cognitive Neuroscience.

[7]  Alex Pentland,et al.  Probabilistic visual learning for object detection , 1995, Proceedings of IEEE International Conference on Computer Vision.

[8]  I. Biederman Recognition-by-components: a theory of human image understanding. , 1987, Psychological review.

[9]  Cordelia Schmid,et al.  Local Grayvalue Invariants for Image Retrieval , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Pietro Perona,et al.  Unsupervised Learning of Models for Recognition , 2000, ECCV.

[11]  Takeo Kanade,et al.  Neural Network-Based Face Detection , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  N. Littlestone Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[13]  Michel Vidal-Naquet,et al.  A Fragment-Based Approach to Object Representation and Classification , 2001, IWVF.

[14]  Hans P. Morevec Towards automatic visual obstacle avoidance , 1977, IJCAI 1977.

[15]  S. Palmer Hierarchical structure in perceptual representation , 1977, Cognitive Psychology.

[16]  David L. Sheinberg,et al.  Visual object recognition. , 1996, Annual review of neuroscience.

[17]  Thomas S. Huang,et al.  Face detection with information-based maximum discrimination , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[18]  Yoshua Bengio,et al.  Object Recognition with Gradient-Based Learning , 1999, Shape, Contour and Grouping in Computer Vision.

[19]  Yali Amit,et al.  A Computational Model for Visual Selection , 1999, Neural Computation.

[20]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[21]  Narendra Ahuja,et al.  A SNoW-Based Face Detector , 1999, NIPS.

[22]  Dan Roth,et al.  Learning to Resolve Natural Language Ambiguities: A Unified Approach , 1998, AAAI/IAAI.

[23]  Dan Roth,et al.  Learning a Sparse Representation for Object Detection , 2002, ECCV.

[24]  S. Ullman High-Level Vision: Object Recognition and Visual Cognition , 1996 .

[25]  Takeo Kanade,et al.  A statistical method for 3D object detection applied to faces and cars , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[26]  D. Perrett,et al.  Recognition of objects and their component parts: responses of single units in the temporal cortex of the macaque. , 1994, Cerebral cortex.