Unsupervised learning of high-order structural semantics from images

Structural semantics are fundamental to understanding both natural and man-made objects from languages to buildings. They are manifested as repeated structures or patterns and are often captured in images. Finding repeated patterns in images, therefore, has important applications in scene understanding, 3D reconstruction, and image retrieval as well as image compression. Previous approaches in visual-pattern mining limited themselves by looking for frequently co-occurring features within a small neighborhood in an image. However, semantics of a visual pattern are typically defined by specific spatial relationships between features regardless of the spatial proximity. In this paper, semantics are represented as visual elements and geometric relationships between them. A novel unsupervised learning algorithm finds pair-wise associations of visual elements that have consistent geometric relationships sufficiently often. The algorithms are efficient - maximal matchings are determined without combinatorial search. High-order structural semantics are extracted by mining patterns that are composed of pairwise spatially consistent associations of visual elements. We demonstrate the effectiveness of our approach for discovering repeated visual patterns on a variety of image collections.

[1]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[2]  Ming Yang,et al.  From frequent itemsets to semantically meaningful visual patterns , 2007, KDD '07.

[3]  Pietro Perona,et al.  Object class recognition by unsupervised scale-invariant learning , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[4]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[6]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[7]  Huamin Wang,et al.  Factoring repeated content within and among images , 2008, ACM Trans. Graph..

[8]  Gustavo Carneiro,et al.  Sparse Flexible Models of Local Features , 2006, ECCV.

[9]  Ying Wu,et al.  Spatial Random Partition for Common Visual Pattern Discovery , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[10]  Leonidas J. Guibas,et al.  Discovering structural regularity in 3D geometry , 2008, SIGGRAPH 2008.

[11]  Luc Van Gool,et al.  Noncombinatorial Detection of Regular Repetitions under Perspective Skew , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[13]  Alexei A. Efros,et al.  Discovering Texture Regularity as a Higher-Order Correspondence Problem , 2006, ECCV.

[14]  David Salesin,et al.  Photographing long scenes with multi-viewpoint panoramas , 2006, SIGGRAPH 2006.

[15]  Sangkyum Kim,et al.  SpaRClus: Spatial Relationship Pattern-Based Hierarchial Clustering , 2008, SDM.

[16]  George Karypis,et al.  Finding Frequent Patterns in a Large Sparse Graph* , 2005, Data Mining and Knowledge Discovery.

[17]  Pietro Perona,et al.  A sparse object category model for efficient learning and exhaustive recognition , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[18]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[19]  Narendra Ahuja,et al.  Extracting Texels in 2.1D Natural Textures , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[20]  Daniel P. Huttenlocher,et al.  Pictorial Structures for Object Recognition , 2004, International Journal of Computer Vision.

[21]  Sven J. Dickinson,et al.  Learning Structured Appearance Models from Captioned Images of Cluttered Scenes , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[22]  Gustavo Carneiro,et al.  Flexible Spatial Configuration of Local Image Features , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.