Hybrid Indexes for Spatial-Visual Search

Due to the growth of geo-tagged images, recent web and mobile applications provide search capabilities for images that are similar to a given query image and simultaneously within a given geographical area. In this paper, we focus on designing index structures to expedite these spatial-visual searches. We start by baseline indexes that are straightforward extensions of the current popular spatial (R*-tree) and visual (LSH) index structures. Subsequently, we propose hybrid index structures that evaluate both spatial and visual features in tandem. A unique challenge of spatial-visual search is that there are inaccuracies in both spatial and visual features. Therefore, different traversals in the same index structures may produce different images as output, some of which are more relevant to the query than the others. We compare our hybrid structures with a set of baseline indexes in both performance and result accuracy using three real world datasets from Flickr, Google Street View, GeoUGV, and a large synthetic dataset. Our comprehensive experimental results demonstrate that our proposed hybrid indexes significantly outperform baselines.

[1]  Naphtali Rishe,et al.  Keyword Search on Spatial Databases , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[2]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[3]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[4]  Chen Li,et al.  Hybrid Indexing and Seamless Ranking of Spatial and Textual Features of Web Documents , 2010, DEXA.

[5]  Pengpeng Zhao,et al.  Scalable Top- k Spatial Image Search on Road Networks , 2015, DASFAA.

[6]  Alexandr Andoni,et al.  Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[7]  Christian S. Jensen,et al.  Spatial Keyword Query Processing: An Experimental Evaluation , 2013, Proc. VLDB Endow..

[8]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[9]  Christopher Joseph Pal,et al.  YouTube Scale, Large Vocabulary Video Annotation , 2010, Video Search and Mining.

[10]  Claudio Gennaro,et al.  YFCC100M HybridNet fc6 Deep Features for Content-Based Image Retrieval , 2016, MMCommons @ ACM Multimedia.

[11]  Pavel Zezula,et al.  M-tree: An Efficient Access Method for Similarity Search in Metric Spaces , 1997, VLDB.

[12]  Cyrus Shahabi,et al.  Efficient indexing and retrieval of large-scale geo-tagged video databases , 2016, GeoInformatica.

[13]  Mark S. Nixon,et al.  Feature Extraction & Image Processing for Computer Vision, Third Edition , 2012 .

[14]  Mubarak Shah,et al.  Image Geo-Localization Based on MultipleNearest Neighbor Feature Matching UsingGeneralized Graphs , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Nicole Immorlica,et al.  Locality-sensitive hashing scheme based on p-stable distributions , 2004, SCG '04.

[16]  Xing Xie,et al.  Hybrid index structures for location-based web search , 2005, CIKM '05.

[17]  Mark S. Nixon,et al.  Feature extraction & image processing for computer vision , 2012 .

[18]  Moni Naor,et al.  Optimal aggregation algorithms for middleware , 2001, PODS '01.

[19]  David A. Shamma,et al.  YFCC100M , 2015, Commun. ACM.

[20]  Mark Sanderson,et al.  Spatio-textual Indexing for Geographical Search on the Web , 2005, SSTD.

[21]  Ji Wan,et al.  Deep Learning for Content-Based Image Retrieval: A Comprehensive Study , 2014, ACM Multimedia.

[22]  David Stutz,et al.  Neural Codes for Image Retrieval , 2015 .

[23]  Cyrus Shahabi,et al.  GeoUGV: user-generated mobile video dataset with fine granularity spatial metadata , 2016, MMSys.

[24]  Roger Zimmermann,et al.  Viewable scene modeling for geospatial video search , 2008, ACM Multimedia.

[25]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[26]  Sunil Arya,et al.  Approximate nearest neighbor queries in fixed dimensions , 1993, SODA '93.

[27]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[28]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[29]  Xing Xie,et al.  Location sensitive indexing for image-based advertising , 2009, MM '09.

[30]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[31]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.