Grammar-based object representations in a scene parsing task

This paper addresses the nature of visual representations associated with complex structured objects, and the role of these representations in perceptual organization. We use a novel experimental paradigm to probe subjects’ intuitions about parsing a scene consisting of overlapping two-dimensional objects. The objects are generated from an abstract 2-dimensional image grammar, which specifies the set of possible configurations of object parts. We show that participants’ performance on the task depends on prior experience with the object class, and is based on structural cues. This indicates that structural representations exerted a top-down influence on parsing. To address the question of representation type, we used a computational model of object matching in conjunction with various probabilistic representational models. Our simulations indicate that grammar-based representations derived from the original grammars are superior to more restrictive exemplar-based representations in explaining human performance on this task, as well as to more inclusive, over-generalizing grammar-based representations.

[1]  Long Zhu,et al.  Unsupervised Learning of a Probabilistic Grammar for Object Detection and Parsing , 2006, NIPS.

[2]  D. Marr,et al.  Representation and recognition of the spatial organization of three-dimensional shapes , 1978, Proceedings of the Royal Society of London. Series B. Biological Sciences.

[3]  Fei-FeiLi,et al.  One-Shot Learning of Object Categories , 2006 .

[4]  Pietro Perona,et al.  Object class recognition by unsupervised scale-invariant learning , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[5]  Pietro Perona,et al.  One-shot learning of object categories , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  R. Aslin,et al.  PSYCHOLOGICAL SCIENCE Research Article UNSUPERVISED STATISTICAL LEARNING OF HIGHER-ORDER SPATIAL STRUCTURES FROM VISUAL SCENES , 2022 .

[7]  J. Hegdé,et al.  Fragment-Based Learning of Visual Object Categories , 2008, Current Biology.

[8]  Robert C. Berwick,et al.  Treebank parsing and knowledge of language: a cognitive perspective , 2009 .

[9]  H. Barlow Vision Science: Photons to Phenomenology by Stephen E. Palmer , 2000, Trends in Cognitive Sciences.

[10]  King-Sun Fu,et al.  Syntactic Methods in Pattern Recognition , 1974, IEEE Transactions on Systems, Man, and Cybernetics.

[11]  Mary P. Harper,et al.  Hierarchical Stochastic Image Grammars for Classification and Segmentation , 2006, IEEE Transactions on Image Processing.

[12]  I. Biederman Recognition-by-components: a theory of human image understanding. , 1987, Psychological review.

[13]  Virginia Savova A Grammar-Based Approach to Visual Category Learning , 2008 .