Residual enhanced visual vector as a compact signature for mobile visual search

Many mobile visual search (MVS) systems transmit query data from a mobile device to a remote server and search a database hosted on the server. In this paper, we present a new architecture for searching a large database directly on a mobile device, which can provide numerous benefits for network-independent, low-latency, and privacy-protected image retrieval. A key challenge for on-device retrieval is storing a large database in the limited RAM of a mobile device. To address this challenge, we develop a new compact, discriminative image signature called the Residual Enhanced Visual Vector (REVV) that is optimized for sets of local features which are fast to extract on mobile devices. REVV outperforms existing compact database constructions in the MVS setting and attains similar retrieval accuracy in large-scale retrieval as a Vocabulary Tree that uses 25x more memory. We have utilized REVV to design and construct a mobile augmented reality system for accurate, large-scale landmark recognition. Fast on-device search with REVV enables our system to achieve latencies around 1s per query regardless of external network conditions. The compactness of REVV allows it to also function well as a low-bitrate signature that can be transmitted to or from a remote server for an efficient expansion of the local database search when required.

[1]  Bernd Girod,et al.  Compressed Histogram of Gradients: A Low-Bitrate Descriptor , 2011, International Journal of Computer Vision.

[2]  Dong Xu,et al.  Trace Ratio vs. Ratio Trace for Dimensionality Reduction , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[4]  Richard Szeliski,et al.  City-Scale Location Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Bernd Girod,et al.  Dynamic selection of a feature-rich query frame for mobile video retrieval , 2010, 2010 IEEE International Conference on Image Processing.

[6]  Bernd Girod,et al.  Mobile product recognition , 2010, ACM Multimedia.

[7]  Cordelia Schmid,et al.  Improving Bag-of-Features for Large Scale Image Search , 2010, International Journal of Computer Vision.

[8]  Bernd Girod,et al.  Outdoors augmented reality on mobile phone using loxel-based visual feature organization , 2008, MIR '08.

[9]  Yi Yang,et al.  A Multimedia Retrieval Framework Based on Semi-Supervised Ranking and Relevance Feedback , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Bernd Girod,et al.  Streaming mobile augmented reality on mobile phones , 2009, 2009 8th IEEE International Symposium on Mixed and Augmented Reality.

[11]  Chuohao Yeo,et al.  Rate-efficient visual correspondences using random projections , 2008, 2008 15th IEEE International Conference on Image Processing.

[12]  Michael Persin,et al.  Document filtering for fast ranking , 1994, SIGIR '94.

[13]  Bernd Girod,et al.  Location coding for mobile image retrieval , 2009, MobiMedia.

[14]  Sean White,et al.  Searching the World's Herbaria: A System for Visual Identification of Plant Species , 2008, ECCV.

[15]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[16]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[17]  Huizhong Chen,et al.  Mobile visual search on printed documents using text and low bit-rate features , 2011, 2011 18th IEEE International Conference on Image Processing.

[18]  Zi Huang,et al.  Multiple feature hashing for real-time large scale near-duplicate video retrieval , 2011, ACM Multimedia.

[19]  Bernd Girod,et al.  Tree Histogram Coding for Mobile Image Matching , 2009, 2009 Data Compression Conference.

[20]  Bernd Girod,et al.  Inverted Index Compression for Scalable Image Matching , 2010, 2010 Data Compression Conference.

[21]  Binoy Pinto,et al.  Speeded Up Robust Features , 2011 .

[22]  P. Laguna,et al.  Signal Processing , 2002, Yearbook of Medical Informatics.

[23]  John A. Swets,et al.  Signal Detection Theory and ROC Analysis in Psychology and Diagnostics: Collected Papers , 1996 .

[24]  Cordelia Schmid,et al.  Evaluation of GIST descriptors for web-scale image search , 2009, CIVR '09.

[25]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[26]  Huizhong Chen,et al.  Combining image and text features: a hybrid approach to mobile book spine recognition , 2011, ACM Multimedia.

[27]  Cordelia Schmid,et al.  Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[28]  Bernd Girod,et al.  Transform coding of image feature descriptors , 2009, Electronic Imaging.

[29]  Antonio Torralba,et al.  Spectral Hashing , 2008, NIPS.

[30]  Vincent Lepetit,et al.  BRIEF: Binary Robust Independent Elementary Features , 2010, ECCV.

[31]  Bernd Girod,et al.  Unified Real-Time Tracking and Recognition with Rotation-Invariant Fast Features , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[32]  Wen Gao,et al.  Towards low bit rate mobile visual search with multiple-channel coding , 2011, ACM Multimedia.

[33]  Michael Isard,et al.  Lost in quantization: Improving particular object retrieval in large scale image databases , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Wen Gao,et al.  Location Discriminative Vocabulary Coding for Mobile Landmark Search , 2011, International Journal of Computer Vision.

[35]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[36]  Antonio Torralba,et al.  Small codes and large image databases for recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Gang Hua,et al.  Discriminative Learning of Local Image Descriptors , 1990, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Florent Perronnin,et al.  Large-scale image retrieval with compressed Fisher vectors , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[39]  Meng Wang,et al.  Unified Video Annotation via Multigraph Learning , 2009, IEEE Transactions on Circuits and Systems for Video Technology.

[40]  Wen Gao,et al.  Compact Descriptors for Visual Search , 2014, IEEE MultiMedia.

[41]  Anas Al-Nuaimi,et al.  Mobile Visual Location Recognition , 2013 .

[42]  Cordelia Schmid,et al.  Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Xin Chen,et al.  City-scale landmark identification on mobile devices , 2011, CVPR 2011.

[44]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[45]  Huizhong Chen,et al.  The stanford mobile visual search data set , 2011, MMSys.

[46]  Alistair Moffat,et al.  Vector-space ranking with effective early termination , 2001, SIGIR '01.

[47]  Cordelia Schmid,et al.  A Performance Evaluation of Local Descriptors , 2005, IEEE Trans. Pattern Anal. Mach. Intell..