Mobile Visual Recognition on Smartphones

This paper addresses the recognition of large-scale outdoor scenes on smartphones by fusing outputs of inertial sensors and computer vision techniques. The main contributions can be summarized as follows. Firstly, we propose an ORD (overlap region divide) method to plot image position area, which is fast enough to find the nearest visiting area and can also reduce the search range compared with the traditional approaches. Secondly, the vocabulary tree-based approach is improved by introducing GAGCC (gravity-aligned geometric consistency constraint). Our method involves no operation in the high-dimensional feature space and does not assume a global transform between a pair of images. Thus, it substantially reduces the computational complexity and memory usage, which makes the city scale image recognition feasible on the smartphone. Experiments on a collected database including 0.16 million images show that the proposed method demonstrates excellent recognition performance, while maintaining the average recognition time about 1 s.

[1]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[2]  Jiri Matas,et al.  Matching with PROSAC - progressive sample consensus , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[3]  Luc Van Gool,et al.  SURF: Speeded Up Robust Features , 2006, ECCV.

[4]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[5]  Ankita Kumar,et al.  Experiments on visual loop closing using vocabulary trees , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[6]  Cordelia Schmid,et al.  Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search , 2008, ECCV.

[7]  Bernd Girod,et al.  Outdoors augmented reality on mobile phone using loxel-based visual feature organization , 2008, MIR '08.

[8]  Harry Shum,et al.  A multi-sample, multi-tree approach to bag-of-words image representation for image retrieval , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[9]  Bernd Girod,et al.  CHoG: Compressed histogram of gradients A low bit-rate feature descriptor , 2009, CVPR.

[10]  Michael Isard,et al.  Bundling features for large scale partial-duplicate web image search , 2009, CVPR.

[11]  Marc Pollefeys,et al.  Handling Urban Location Recognition as a 2D Homothetic Problem , 2010, ECCV.

[12]  Bernd Girod,et al.  Fast geometric re-ranking for image-based retrieval , 2010, 2010 IEEE International Conference on Image Processing.

[13]  Roland Siegwart,et al.  BRISK: Binary Robust invariant scalable keypoints , 2011, 2011 International Conference on Computer Vision.

[14]  Ming Yang,et al.  Efficient re-ranking in vocabulary tree based image retrieval , 2011, 2011 Conference Record of the Forty Fifth Asilomar Conference on Signals, Systems and Computers (ASILOMAR).

[15]  Selim Benhimane,et al.  Inertial sensor-aligned visual feature descriptors , 2011, CVPR 2011.

[16]  Xin Chen,et al.  City-scale landmark identification on mobile devices , 2011, CVPR 2011.

[17]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[18]  Anas Al-Nuaimi,et al.  Mobile Visual Location Recognition , 2013 .

[19]  Ming Yang,et al.  Contextual weighting for vocabulary tree based image retrieval , 2011, 2011 International Conference on Computer Vision.