On-Device Mobile Visual Location Recognition by Integrating Vision and Inertial Sensors

This paper deals with the problem of city scale on-device mobile visual location recognition by fusing the inertial sensors and computer vision techniques. The main contributions are as follows: Firstly, we design an efficient vector quantization strategy by combining the Transform Coding (TC) and Residual Vector Quantization (RVQ). Our method can compress a visual descriptor into only several bytes while providing reasonable searching accuracy, which makes the managing of city scale image database directly on mobile devices come true. Secondly, we integrate the information from inertial sensors into the Vector of Locally Aggregated Descriptors (VLAD) generation and image similarity evaluation processes. Our method is not only fast enough for on-device implementation, but it also can improve the location recognition accuracy obviously. Thirdly, we also release a set of 1.295 million geo-tagged street view images with the information from inertial sensors, as well as a difficult set of query images. These resources can be used as a new benchmark to facilitate further research in the area. Experimental results prove the validity of the proposed methods for on-device mobile visual location recognition applications.

[1]  Cheng Wang,et al.  Approximate Nearest Neighbor Search by Residual Vector Quantization , 2010, Sensors.

[2]  Marc Pollefeys,et al.  Handling Urban Location Recognition as a 2D Homothetic Problem , 2010, ECCV.

[3]  Anas Al-Nuaimi,et al.  Mobile Visual Location Recognition , 2013 .

[4]  Cordelia Schmid,et al.  Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Xin Chen,et al.  City-scale landmark identification on mobile devices , 2011, CVPR 2011.

[6]  Eckehard G. Steinbach,et al.  Exploiting prior knowledge in mobile visual location recognition , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7]  Bernd Girod,et al.  Compressed Histogram of Gradients: A Low-Bitrate Descriptor , 2011, International Journal of Computer Vision.

[8]  Wen Gao,et al.  When codeword frequency meets geographical location , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  Daniel P. Huttenlocher,et al.  Location Recognition Using Prioritized Feature Matching , 2010, ECCV.

[10]  Huizhong Chen,et al.  Residual Enhanced Visual Vectors for on-device image matching , 2011, 2011 Conference Record of the Forty Fifth Asilomar Conference on Signals, Systems and Computers (ASILOMAR).

[11]  Jonathan Brandt,et al.  Transform coding for fast approximate nearest neighbor search in high dimensions , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[12]  Zhen Li,et al.  Content and Context Boosting for Mobile Landmark Recognition , 2012, IEEE Signal Processing Letters.

[13]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[14]  Antonio Torralba,et al.  Spectral Hashing , 2008, NIPS.

[15]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[16]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[17]  Luc Van Gool,et al.  SURF: Speeded Up Robust Features , 2006, ECCV.

[18]  Tao Chen,et al.  Integrated Content and Context Analysis for Mobile Landmark Recognition , 2011, IEEE Transactions on Circuits and Systems for Video Technology.

[19]  Richard Szeliski,et al.  City-Scale Location Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Selim Benhimane,et al.  Inertial sensor-aligned visual feature descriptors , 2011, CVPR 2011.

[21]  Cordelia Schmid,et al.  Aggregating Local Image Descriptors into Compact Codes , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Bernd Girod,et al.  Outdoors augmented reality on mobile phone using loxel-based visual feature organization , 2008, MIR '08.

[23]  Nicole Immorlica,et al.  Locality-sensitive hashing scheme based on p-stable distributions , 2004, SCG '04.

[24]  Wen Gao,et al.  Location Discriminative Vocabulary Coding for Mobile Landmark Search , 2011, International Journal of Computer Vision.

[25]  Bernd Girod,et al.  Tree Histogram Coding for Mobile Image Matching , 2009, 2009 Data Compression Conference.