Learning the Compositional Nature of Visual Object Categories for Recognition

Real-world scene understanding requires recognizing object categories in novel visual scenes. This paper describes a composition system that automatically learns structured, hierarchical object representations in an unsupervised manner without requiring manual segmentation or manual object localization. A central concept for learning object models in the challenging, general case of unconstrained scenes, large intraclass variations, large numbers of categories, and lacking supervision information is to exploit the compositional nature of our (visual) world. The compositional nature of visual objects significantly limits their representation complexity and renders learning of structured object models statistically and computationally tractable. We propose a robust descriptor for local image parts and show how characteristic compositions of parts can be learned that are based on an unspecific part vocabulary shared between all categories. Moreover, a Bayesian network is presented that comprises all the compositional constituents together with scene context and object shape. Object recognition is then formulated as a statistical inference problem in this probabilistic model.

[1]  Shimon Ullman,et al.  Feature hierarchies for object classification , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[2]  Yali Amit,et al.  A Computational Model for Visual Selection , 1999, Neural Computation.

[3]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[4]  Leszek Wojnar,et al.  Image Analysis , 1998 .

[5]  Alexei A. Efros,et al.  Discovering objects and their location in images , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[6]  Joachim M. Buhmann,et al.  Histogram clustering for unsupervised segmentation and image retrieval , 1999, Pattern Recognit. Lett..

[7]  Pietro Perona,et al.  A Visual Category Filter for Google Images , 2004, ECCV.

[8]  Emanuele Trucco,et al.  Robust motion and correspondence of noisy 3-D point sets with missing data , 1999, Pattern Recognit. Lett..

[9]  David G. Lowe,et al.  Perceptual Organization and Visual Recognition , 2012 .

[10]  David G. Lowe,et al.  Multiclass Object Recognition with Sparse, Localized Features , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[11]  Michael I. Jordan,et al.  On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes , 2001, NIPS.

[12]  F. Attneave Some informational aspects of visual perception. , 1954, Psychological review.

[13]  Trevor Darrell,et al.  The pyramid match kernel: discriminative classification with sets of image features , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[14]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[15]  Antonio Torralba,et al.  Learning hierarchical models of scenes, objects, and parts , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[16]  Remco C. Veltkamp,et al.  Content-based image retrieval systems: A survey , 2000 .

[17]  Joachim M. Buhmann,et al.  Distortion Invariant Object Recognition in the Dynamic Link Architecture , 1993, IEEE Trans. Computers.

[18]  Pietro Perona,et al.  Unsupervised Learning of Models for Recognition , 2000, ECCV.

[19]  Shimon Ullman,et al.  Combining Top-Down and Bottom-Up Segmentation , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[20]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[21]  Pietro Perona,et al.  Object class recognition by unsupervised scale-invariant learning , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[22]  Pietro Perona,et al.  A Probabilistic Approach to Object Recognition Using Local Photometry and Global Geometry , 1998, ECCV.

[23]  Thomas Serre,et al.  Robust Object Recognition with Cortex-Like Mechanisms , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Kaisheng Lu,et al.  Structural properties of composite systems analysis software development and algorithm , 2010, 2010 8th World Congress on Intelligent Control and Automation.

[25]  Luc Van Gool,et al.  Object Detection by Contour Segment Networks , 2006, ECCV.

[26]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[27]  Andrew Zisserman,et al.  Image Classification using Random Forests and Ferns , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[28]  B. Schiele,et al.  Combined Object Categorization and Segmentation With an Implicit Shape Model , 2004 .

[29]  Joachim M. Buhmann,et al.  Learning Compositional Categorization Models , 2006, ECCV.

[30]  Volker Roth,et al.  Pairwise coupling for machine recognition of hand-printed Japanese characters , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[31]  Trevor Darrell,et al.  Pyramid Match Kernels: Discriminative Classification with Sets of Image Features (version 2) , 2006 .

[32]  Stuart Geman,et al.  Context and Hierarchy in a Probabilistic Image Model , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[33]  Jitendra Malik,et al.  SVM-KNN: Discriminative Nearest Neighbor Classification for Visual Category Recognition , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[34]  I. Biederman Recognition-by-components: a theory of human image understanding. , 1987, Psychological review.

[35]  Jitendra Malik,et al.  Shape matching and object recognition using low distortion correspondences , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[36]  Andrew Zisserman,et al.  Incremental learning of object detectors using a visual shape alphabet , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[37]  Joachim M. Buhmann,et al.  Learning Top-Down Grouping of Compositional Hierarchies for Recognition , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[38]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[39]  Thomas Hofmann,et al.  Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.

[40]  Bernt Schiele,et al.  Scale-Invariant Object Categorization Using a Scale-Adaptive Mean-Shift Search , 2004, DAGM-Symposium.

[41]  Gerhard Winkler,et al.  Image Analysis, Random Fields and Markov Chain Monte Carlo Methods: A Mathematical Introduction , 2002 .

[42]  Cordelia Schmid,et al.  Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.

[43]  Zhuowen Tu,et al.  Image Parsing: Unifying Segmentation, Detection, and Recognition , 2005, International Journal of Computer Vision.

[44]  Daniel P. Huttenlocher,et al.  Pictorial Structures for Object Recognition , 2004, International Journal of Computer Vision.

[45]  Guillaume Bouchard,et al.  Hierarchical part-based visual object categorization , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[46]  Kunihiko Fukushima,et al.  Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.

[47]  Martin A. Fischler,et al.  The Representation and Matching of Pictorial Structures , 1973, IEEE Transactions on Computers.

[48]  Dan Roth,et al.  Learning to detect objects in images via a sparse, part-based representation , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[49]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[50]  Pietro Perona,et al.  Learning object categories from Google's image search , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[51]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[52]  Pietro Perona,et al.  Combining generative models and Fisher kernels for object recognition , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[53]  Joachim M. Buhmann,et al.  Object Categorization by Compositional Graphical Models , 2005, EMMCVPR.