Estimating viewing angles in mobile street view search

Recent years have witnessed an exciting progress in mobile visual search with applications to location recognition and streaming augmented reality. Most existing works among them are deployed with reference images coming from street views in urban scene. In such scenario, an interesting yet untouched problem is how to determine the viewing angle of the visual query aside of search, which could benefit multidisciplinary applications such as purifying the visual matching and accelerating the streaming AR. In this paper, we study the viewing angle estimation by exploiting the visual appearance of the query, which might be further improved by incorporating the coarse mobile context such as gyro or compass information. Our main idea is to treat this problem as a scene classification problem, upon which the key design is an optimal visual signature to reveal diversity of different viewing angles. We introduce a novel layout based viewing angle descriptor, which is based on carefully designed spatial division as well as appearance feature like color, texture and gradient. We have validated our approach on our dataset containing 1232 street view images in the urban areas of Manhattan, New York City. We show that our proposed descriptor has outperformed several alternatives in holistic image representations, including GIST, HOG and bag-of-feature with spatial pyramid matching.

[1]  Tomás Pajdla,et al.  Avoiding Confusing Features in Place Recognition , 2010, ECCV.

[2]  Rongrong Ji,et al.  Active query sensing for mobile location search , 2011, ACM Multimedia.

[3]  Stephen Gould,et al.  Multi-Class Segmentation with Relative Location Prior , 2008, International Journal of Computer Vision.

[4]  A. Torralba,et al.  Matching and Predicting Street Level Images , 2010 .

[5]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[6]  Bernd Girod,et al.  Mobile Visual Search , 2011, IEEE Signal Processing Magazine.

[7]  David A. Forsyth,et al.  Matching Words and Pictures , 2003, J. Mach. Learn. Res..

[8]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[9]  Antonio Torralba,et al.  Object Detection and Localization Using Local and Global Features , 2006, Toward Category-Level Object Recognition.

[10]  Antonio Torralba,et al.  Contextual Priming for Object Detection , 2003, International Journal of Computer Vision.

[11]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[12]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[13]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[14]  Jitendra Malik,et al.  Representing and Recognizing the Visual Appearance of Materials using Three-dimensional Textons , 2001, International Journal of Computer Vision.