Embedding spatial information into image content description for scene retrieval

This article presents @D-TSR, an image content representation describing the spatial layout with triangular relationships of visual entities, which can be symbolic objects or low-level visual features. A semi-local implementation of @D-TSR is also proposed, making the description robust to viewpoint changes. We evaluate @D-TSR for image retrieval under the query-by-example paradigm, on contents represented with interest points in a bag-of-features model: it improves state-of-the-art techniques, in terms of retrieval quality as well as of execution time, and is scalable. Finally, its effectiveness is evaluated on a topical scenario dedicated to scene retrieval in datasets of city landmarks.

[1]  Juan Carlos Niebles,et al.  A Hierarchical Model of Shape and Appearance for Human Action Classification , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Trevor Darrell,et al.  The pyramid match kernel: discriminative classification with sets of image features , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[3]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[5]  Claudio Gutierrez,et al.  Survey of graph database models , 2008, CSUR.

[6]  Guillaume Bouchard,et al.  Hierarchical part-based visual object categorization , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[7]  Cordelia Schmid,et al.  Improving Bag-of-Features for Large Scale Image Search , 2010, International Journal of Computer Vision.

[8]  Alberto Del Bimbo,et al.  Weighted walkthroughs between extended entities for retrieval by spatial arrangement , 2003, IEEE Trans. Multim..

[9]  Po-Whei Huang,et al.  Image database design based on 9D-SPA representation for spatial relations , 2004, IEEE Transactions on Knowledge and Data Engineering.

[10]  Andrew Gilbert,et al.  Scale Invariant Action Recognition Using Compound Features Mined from Dense Spatio-temporal Corners , 2008, ECCV.

[11]  Silvio Savarese,et al.  Discriminative Object Class Models of Appearance and Shape by Correlatons , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[12]  Maude Manouvrier,et al.  ¢-TSR: a description of spatial relationships between objects for image retrieval ⁄ , 2010 .

[13]  Michael Isard,et al.  Bundling features for large scale partial-duplicate web image search , 2009, CVPR.

[14]  Lin Yang,et al.  Multiple Class Segmentation Using A Unified Framework over Mean-Shift Patches , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[16]  D. S. Guru,et al.  Symbolic image indexing and retrieval by spatial similarity: An approach based on B-tree , 2008, Pattern Recognit..

[17]  Euripides G. M. Petrakis,et al.  Design and evaluation of spatial similarity approaches for image retrieval , 2002, Image Vis. Comput..

[18]  Gang Hua,et al.  Integrated feature selection and higher-order spatial feature extraction for object categorization , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Ming Yang,et al.  Discovery of Collocation Patterns: from Visual Words to Visual Phrases , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  MAX J. EGENHOFER,et al.  Point Set Topological Relations , 1991, Int. J. Geogr. Inf. Sci..

[21]  P. Nagabhushan,et al.  Triangular spatial relationship: a new approach for spatial knowledge representation , 2001, Pattern Recognit. Lett..

[22]  Long Zhu,et al.  Learning a Hierarchical Deformable Template for Rapid Deformable Object Parsing , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Alexei A. Efros,et al.  Discovering objects and their location in images , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[24]  Erland Jungert,et al.  A Spatial Knowledge Structure for Image Information Systems Using Symbolic Projections , 1986, FJCC.

[25]  Weijun Wang,et al.  Object retrieval using configurations of salient regions , 2008, CIVR '08.

[26]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[27]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[28]  Euripides G. M. Petrakis,et al.  ImageMap: An Image Indexing Method Based on Spatial Similarity , 2002, IEEE Trans. Knowl. Data Eng..

[29]  O. Chum,et al.  ENHANCING RANSAC BY GENERALIZED MODEL OPTIMIZATION Onďrej Chum, Jǐ , 2003 .

[30]  Chong-Wah Ngo,et al.  Evaluating bag-of-visual-words representations in scene classification , 2007, MIR '07.

[31]  Chin-Chen Chang,et al.  Spatial Match Retrieval of Symbolic Pictures , 1991, J. Inf. Sci. Eng..

[32]  Jorma Laaksonen,et al.  Spatial extensions to bag of visual words , 2009, CIVR '09.

[33]  James Ze Wang,et al.  Image retrieval: Ideas, influences, and trends of the new age , 2008, CSUR.