On-Device Mobile Landmark Recognition Using Binarized Descriptor with Multifeature Fusion

Along with the exponential growth of high-performance mobile devices, on-device Mobile Landmark Recognition (MLR) has recently attracted increasing research attention. However, the latency and accuracy of automatic recognition remain as bottlenecks against its real-world usage. In this article, we introduce a novel framework that combines interactive image segmentation with multifeature fusion to achieve improved MLR with high accuracy. First, we propose an effective vector binarization method to reduce the memory usage of image descriptors extracted on-device, which maintains comparable recognition accuracy to the original descriptors. Second, we design a location-aware fusion algorithm that can fuse multiple visual features into a compact yet discriminative image descriptor to improve on-device efficiency. Third, a user-friendly interaction scheme is developed that enables interactive foreground/background segmentation to largely improve recognition accuracy. Experimental results demonstrate the effectiveness of the proposed algorithms for on-device MLR applications.

[1]  Qi Tian,et al.  Spatial coding for large scale partial-duplicate web image search , 2010, ACM Multimedia.

[2]  Zi Huang,et al.  Multi-Feature Fusion via Hierarchical Regression for Multimedia Analysis , 2013, IEEE Transactions on Multimedia.

[3]  Qi Tian,et al.  Context-Aware Semi-Local Feature Detector , 2012, TIST.

[4]  Matthijs Douze,et al.  Bag-of-colors for improved image search , 2011, ACM Multimedia.

[5]  Zi Huang,et al.  Multiple feature hashing for real-time large scale near-duplicate video retrieval , 2011, ACM Multimedia.

[6]  Changsheng Xu,et al.  Interaction Design for Mobile Visual Search , 2013, IEEE Transactions on Multimedia.

[7]  Gang Hua,et al.  Building contextual visual vocabulary for large-scale image applications , 2010, ACM Multimedia.

[8]  Changsheng Xu,et al.  Mobile Landmark Search with 3D Models , 2014, IEEE Transactions on Multimedia.

[9]  Dong Liu,et al.  Robust late fusion with rank minimization , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Tao Mei,et al.  Local visual words coding for low bit rate mobile visual search , 2012, ACM Multimedia.

[11]  Changsheng Xu,et al.  Enhanced 3-D Modeling for Landmark Image Classification , 2012, IEEE Transactions on Multimedia.

[12]  Hai Yang,et al.  ACM Transactions on Intelligent Systems and Technology - Special Section on Urban Computing , 2014 .

[13]  Harry Shum,et al.  Lazy snapping , 2004, ACM Trans. Graph..

[14]  Tao Chen,et al.  Integrated Content and Context Analysis for Mobile Landmark Recognition , 2011, IEEE Transactions on Circuits and Systems for Video Technology.

[15]  Emmanuel Dellandréa,et al.  A Selective Weighted Late Fusion for Visual Concept Recognition , 2012, ECCV Workshops.

[16]  Selim Benhimane,et al.  Inertial sensor-aligned visual feature descriptors , 2011, CVPR 2011.

[17]  Andrew Blake,et al.  "GrabCut" , 2004, ACM Trans. Graph..

[18]  Winston H. Hsu,et al.  Sketch-based image retrieval on mobile devices using compact hash bits , 2012, ACM Multimedia.

[19]  Bo Han,et al.  TouchCut: Fast image and video segmentation using single-touch interaction , 2014, Comput. Vis. Image Underst..

[20]  Antonio Torralba,et al.  Spectral Hashing , 2008, NIPS.

[21]  Cordelia Schmid,et al.  Aggregating Local Image Descriptors into Compact Codes , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Yongdong Zhang,et al.  Efficient approximate nearest neighbor search with integrated binary codes , 2011, ACM Multimedia.

[23]  Wei Liu,et al.  Interactive foreground segmentation method using mean shift and graph cuts , 2009 .

[24]  Jerry Alan Fails,et al.  Interactive machine learning , 2003, IUI '03.

[25]  Huizhong Chen,et al.  Combining image and text features: a hybrid approach to mobile book spine recognition , 2011, ACM Multimedia.

[26]  Anas Al-Nuaimi,et al.  Mobile Visual Location Recognition , 2013 .

[27]  Cordelia Schmid,et al.  Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Xin Chen,et al.  City-scale landmark identification on mobile devices , 2011, CVPR 2011.

[29]  Nicole Immorlica,et al.  Locality-sensitive hashing scheme based on p-stable distributions , 2004, SCG '04.

[30]  Jonathan Brandt,et al.  Transform coding for fast approximate nearest neighbor search in high dimensions , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[31]  Zhen Li,et al.  Efficient mobile landmark recognition based on saliency-aware scalable vocabulary tree , 2012, ACM Multimedia.

[32]  Giuseppe Sansonetti,et al.  An approach to social recommendation for context-aware mobile services , 2013, TIST.

[33]  Svetlana Lazebnik,et al.  Iterative quantization: A procrustean approach to learning binary codes , 2011, CVPR 2011.

[34]  Wen Gao,et al.  Towards low bit rate mobile visual search with multiple-channel coding , 2011, ACM Multimedia.

[35]  Trevor Darrell,et al.  Doubleshot: an interactive user-aided segmentation tool , 2005, IUI '05.

[36]  Cordelia Schmid,et al.  Combining attributes and Fisher vectors for efficient image retrieval , 2011, CVPR 2011.

[37]  Marc Pollefeys,et al.  Handling Urban Location Recognition as a 2D Homothetic Problem , 2010, ECCV.

[38]  Svetlana Lazebnik,et al.  Locality-sensitive binary codes from shift-invariant kernels , 2009, NIPS.

[39]  Svetlana Lazebnik,et al.  Asymmetric Distances for Binary Embeddings , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[41]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[42]  Marie-Pierre Jolly,et al.  Interactive graph cuts for optimal boundary & region segmentation of objects in N-D images , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[43]  Junqing Yu,et al.  Efficient BOF Generation and Compression for On-Device Mobile Visual Location Recognition , 2014, IEEE MultiMedia.

[44]  Andrew Zisserman,et al.  Representing shape with a spatial pyramid kernel , 2007, CIVR '07.

[45]  Christopher Hunt,et al.  Notes on the OpenSURF Library , 2009 .

[46]  Sean White,et al.  Designing a mobile user interface for automated species identification , 2007, CHI.

[47]  Tao Mei,et al.  Finding perfect rendezvous on the go: accurate mobile visual localization and its applications to routing , 2012, ACM Multimedia.

[48]  Bernd Girod,et al.  Compressed Histogram of Gradients: A Low-Bitrate Descriptor , 2011, International Journal of Computer Vision.

[49]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[50]  Cheng Wang,et al.  Approximate Nearest Neighbor Search by Residual Vector Quantization , 2010, Sensors.

[51]  Zhen Li,et al.  A Comparative Study of Mobile-Based Landmark Recognition Techniques , 2010, IEEE Intelligent Systems.

[52]  Junqing Yu,et al.  On-Device Mobile Visual Location Recognition by Integrating Vision and Inertial Sensors , 2013, IEEE Transactions on Multimedia.

[53]  Xuelong Li,et al.  When Location Meets Social Multimedia , 2015, ACM Transactions on Intelligent Systems and Technology.

[54]  Bernd Girod,et al.  Residual enhanced visual vector as a compact signature for mobile visual search , 2013, Signal Process..

[55]  Ke Gao,et al.  Geometric context-preserving progressive transmission in mobile visual search , 2012, ACM Multimedia.

[56]  JiRongrong,et al.  Context-Aware Semi-Local Feature Detector , 2012 .

[57]  Sebastian Nowozin,et al.  On feature combination for multiclass object classification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[58]  Marc Sebban,et al.  Discriminative feature fusion for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[59]  Wen Gao,et al.  Location Discriminative Vocabulary Coding for Mobile Landmark Search , 2011, International Journal of Computer Vision.

[60]  Yang Wang,et al.  JIGSAW: interactive mobile visual search with multimodal queries , 2011, ACM Multimedia.