Bottom-up/top-down image parsing by attribute graph grammar

In this paper, we present an attribute graph grammar for image parsing on scenes with man-made objects, such as buildings, hallways, kitchens, and living moms. We choose one class of primitives - 3D planar rectangles projected on images and six graph grammar production rules. Each production rule not only expands a node into its components, but also includes a number of equations that constrain the attributes of a parent node and those of its children. Thus our graph grammar is context sensitive. The grammar rules are used recursively to produce a large number of objects and patterns in images and thus the whole graph grammar is a type of generative model. The inference algorithm integrates bottom-up rectangle detection which activates top-down prediction using the grammar rules. The final results are validated in a Bayesian framework. The output of the inference is a hierarchical parsing graph with objects, surfaces, rectangles, and their spatial relations. In the inference, the acceptance of a grammar rule means recognition of an object, and actions are taken to pass the attributes between a node and its parent through the constraint equations associated with this production rule. When an attribute is passed from a child node to a parent node, it is called bottom-up, and the opposite is called top-down

[1]  King-Sun Fu,et al.  Syntactic Pattern Recognition And Applications , 1968 .

[2]  King-Sun Fu,et al.  A Syntactic Approach to Shape Recognition Using Attributed Grammars , 1979, IEEE Transactions on Systems, Man, and Cybernetics.

[3]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[4]  Stéphane Mallat,et al.  Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[5]  Stephan Baumann A simplified attributed graph grammar for high-level music recognition , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[6]  Andy Schürr,et al.  Defining and Parsing Visual Languages with Layered Graph Grammars , 1997, J. Vis. Lang. Comput..

[7]  Rong Zhang,et al.  Integrating bottom-up/top-down for object recognition by data driven Markov chain Monte Carlo , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[8]  Song-Chun Zhu,et al.  Towards a mathematical theory of primal sketch and sketchability , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[9]  Jana Kosecka,et al.  Extraction, matching and pose recovery based on dominant rectangular structures , 2003, HLK.

[10]  Zhuowen Tu,et al.  Image Parsing: Unifying Segmentation, Detection, and Recognition , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[11]  Kun Huang,et al.  Symmetry-based photo editing , 2003, First IEEE International Workshop on Higher-Level Knowledge in 3D Modeling and Motion Analysis, 2003. HLK 2003..

[12]  Zhuowen Tu,et al.  Image Parsing: Unifying Segmentation, Detection, and Recognition , 2005, International Journal of Computer Vision.