Towards Scalable Representations of Object Categories: Learning a Hierarchy of Parts

This paper proposes a novel approach to constructing a hierarchical representation of visual input that aims to enable recognition and detection of a large number of object categories. Inspired by the principles of efficient indexing (bottom-up,), robust matching (top-down,), and ideas of compositionality, our approach learns a hierarchy of spatially flexible compositions, i.e. parts, in an unsupervised, statistics-driven manner. Starting with simple, frequent features, we learn the statistically most significant compositions (parts composed of parts), which consequently define the next layer. Parts are learned sequentially, layer after layer, optimally adjusting to the visual data. Lower layers are learned in a category-independent way to obtain complex, yet sharable visual building blocks, which is a crucial step towards a scalable representation. Higher layers of the hierarchy, on the other hand, are constructed by using specific categories, achieving a category representation with a small number of highly generalizable parts that gained their structural flexibility through composition within the hierarchy. Built in this way, new categories can be efficiently and continuously added to the system by adding a small number of parts only in the higher layers. The approach is demonstrated on a large collection of images and a variety of object categories. Detection results confirm the effectiveness and robustness of the learned parts.

[1]  Takayuki Ito,et al.  Neocognitron: A neural network model for a mechanism of visual pattern recognition , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[2]  Gil J. Ettinger,et al.  Large hierarchical object recognition using libraries of parameterized model sub-parts , 1988, Proceedings CVPR '88: The Computer Society Conference on Computer Vision and Pattern Recognition.

[3]  John K. Tsotsos Analyzing vision at the complexity level , 1990, Behavioral and Brain Sciences.

[4]  Rakesh Mohan,et al.  Multidimensional Indexing for Recognizing Visual Shapes , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Yali Amit,et al.  A Computational Model for Visual Selection , 1999, Neural Computation.

[6]  T. Poggio,et al.  Hierarchical models of object recognition in cortex , 1999, Nature Neuroscience.

[7]  Bartlett W. Mel,et al.  Minimizing Binding Errors Using Learned Conjunctive Features , 2000, Neural Computation.

[8]  Gustavo Deco,et al.  Computational neuroscience of vision , 2002 .

[9]  K. Clark CONDITIONS FOR VERSATILE LEARNING , HELMHOLTZ ’ S UNCONSCIOUS INFERENCE , AND THE TASK OF PERCEPTION , 2002 .

[10]  Nathan Intrator,et al.  Towards structural systematicity in distributed, statically bound visual representations , 2003, Cogn. Sci..

[11]  Yali Amit,et al.  Sequential Learning of Reusable Parts for Object Detection , 2003 .

[12]  Donald Geman,et al.  Coarse-to-Fine Face Detection , 2004, International Journal of Computer Vision.

[13]  Thomas Serre,et al.  Object recognition with features inspired by visual cortex , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[14]  Antonio Torralba,et al.  Learning hierarchical models of scenes, objects, and parts , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[15]  Justus H. Piater,et al.  Statistical Learning of Visual Feature Hierarchies , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Workshops.

[16]  Sanja Fidler,et al.  Hierarchical Statistical Learning of Generic Parts of Object Structure , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[17]  Cordelia Schmid,et al.  Toward Category-Level Object Recognition , 2006, Toward Category-Level Object Recognition.

[18]  Ankur Agarwal,et al.  Hyperfeatures - Multilevel Local Coding for Visual Recognition , 2006, ECCV.

[19]  Shimon Ullman,et al.  Visual Classification by a Hierarchy of Extended Fragments , 2006, Toward Category-Level Object Recognition.

[20]  Yann LeCun,et al.  Large-scale Learning with SVM and Convolutional for Generic Object Categorization , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[21]  Bernt Schiele,et al.  Multiple Object Class Detection with a Generative Model , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[22]  David G. Lowe,et al.  Multiclass Object Recognition with Sparse, Localized Features , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).