A stochastic graph grammar for compositional object representation and recognition

This paper illustrates a hierarchical generative model for representing and recognizing compositional object categories with large intra-category variance. In this model, objects are broken into their constituent parts and the variability of configurations and relationships between these parts are modeled by stochastic attribute graph grammars, which are embedded in an And-Or graph for each compositional object category. It combines the power of a stochastic context free grammar (SCFG) to express the variability of part configurations, and a Markov random field (MRF) to represent the pictorial spatial relationships between these parts. As a generative model, different object instances of a category can be realized as a traversal through the And-Or graph to arrive at a valid configuration (like a valid sentence in language, by analogy). The inference/recognition procedure is intimately tied to the structure of the model and follows a probabilistic formulation consisting of bottom-up detection steps for the parts, which in turn recursively activate the grammar rules for top-down verification and searches for missing parts. We present experiments comparing our results to state of art methods and demonstrate the potential of our proposed framework on compositional objects with cluttered backgrounds using training and testing data from the public Lotus Hill and Caltech datasets.

[1]  Wei Zhang,et al.  Object class recognition using multiple layer boosting with heterogeneous features , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[2]  I. Biederman Recognition-by-components: a theory of human image understanding. , 1987, Psychological review.

[3]  Luc Van Gool,et al.  Object Detection by Contour Segment Networks , 2006, ECCV.

[4]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[5]  Michael I. Miller,et al.  Constrained Stochastic Language Models , 1996 .

[6]  Daniel P. Huttenlocher,et al.  Pictorial Structures for Object Recognition , 2004, International Journal of Computer Vision.

[7]  C. Gilbert,et al.  Perceptual learning and top-down influences in primary visual cortex , 2004, Nature Neuroscience.

[8]  Antonio Criminisi,et al.  TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-class Object Recognition and Segmentation , 2006, ECCV.

[9]  Yongtian Wang,et al.  Layered Graph Match with Graph Editing , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Song-Chun Zhu,et al.  Deformable Template As Active Basis , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[11]  Jake Porway,et al.  Object Categorization: Learning Compositional Models for Object Categories from Small Sample Sets , 2008 .

[12]  Hong Chen,et al.  Composite Templates for Cloth Modeling and Sketching , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[13]  Y. Ohta Knowledge-based interpretation of outdoor natural color scenes , 1998 .

[14]  Martin A. Fischler,et al.  The Representation and Matching of Pictorial Structures , 1973, IEEE Transactions on Computers.

[15]  Ashutosh Saxena,et al.  Learning 3-D Scene Structure from a Single Still Image , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[16]  Sven J. Dickinson,et al.  Generic model abstraction from examples , 2000, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Azriel Rosenfeld,et al.  From volumes to views: An approach to 3-D object recognition , 1992, CVGIP Image Underst..

[18]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[19]  Shimon Ullman,et al.  View-Invariant Recognition Using Corresponding Object Fragments , 2004, ECCV.

[20]  Feng Han,et al.  Bottom-up/top-down image parsing by attribute graph grammar , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[21]  S. Nayar,et al.  Early Visual Learning , 1996 .

[22]  Long Zhu,et al.  Unsupervised Learning of a Probabilistic Grammar for Object Detection and Parsing , 2006, NIPS.

[23]  King-Sun Fu,et al.  Syntactic Pattern Recognition And Applications , 1968 .

[24]  Yongtian Wang,et al.  An Empirical Study of Object Category Recognition: Sequential Testing with Generalized Samples , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[25]  Song-Chun Zhu,et al.  Towards a mathematical theory of primal sketch and sketchability , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[26]  Jesfis Peral,et al.  Heuristics -- intelligent search strategies for computer problem solving , 1984 .

[27]  Benjamin Z. Yao,et al.  Introduction to a Large-Scale General Purpose Ground Truth Database: Methodology, Annotation Tool and Benchmarks , 2007, EMMCVPR.

[28]  Pietro Perona,et al.  Towards automatic discovery of object categories , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[29]  Pietro Perona,et al.  A sparse object category model for efficient learning and exhaustive recognition , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[30]  Judea Pearl,et al.  Heuristics : intelligent search strategies for computer problem solving , 1984 .

[31]  Zhuowen Tu,et al.  Image Parsing: Unifying Segmentation, Detection, and Recognition , 2005, International Journal of Computer Vision.

[32]  Zhuowen Tu,et al.  Probabilistic boosting-tree: learning discriminative models for classification, recognition, and clustering , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[33]  Cordelia Schmid,et al.  Human Detection Using Oriented Histograms of Flow and Appearance , 2006, ECCV.

[34]  Gregory J. Zelinsky,et al.  Object class recognition using multiple layer boosting with multiple features , 2005, CVPR 2005.

[35]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[36]  Zhuowen Tu,et al.  Image Segmentation by Data-Driven Markov Chain Monte Carlo , 2002, IEEE Trans. Pattern Anal. Mach. Intell..