Augmenting mobile city-view image retrieval with context-rich user-contributed photos

With the growth of mobile devices, the needs for location-based services are emerging. Taking the advantage of the GPS information, we can roughly estimate a user's location. However, it is necessary to leverage extra information (e.g., photos) to precisely locate the object of interest through mobile devices for further applications such as mobile search. Users can simply take a picture (with GPS enabled) of an interesting target to retrieve the building information. Therefore, the raise of real-time building recognition or retrieval system becomes a challenging problem. The most recent approaches are to recognize buildings by the street-view images; however, the query photos from mobile devices usually contain different lighting conditions. In order to provide a more robust city-view image retrieval system, we propose to augment the visual diversity of database images by integrating the context-rich user-contributed photos from social media. Preliminary experimental results show that the street-view images can provide different angles of the target whereas the user-contributed photos can enhance the diversity of the target. Besides, for the real-time retrieval system, we also combine both visual and GPS constraints in the retrieval process on inverted indexing so that we can achieve a real-time retrieval system.

[1]  Tomás Pajdla,et al.  Avoiding Confusing Features in Place Recognition , 2010, ECCV.

[2]  Alexei A. Efros,et al.  IM2GPS: estimating geographic information from a single image , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Cordelia Schmid,et al.  Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[4]  JUSTIN ZOBEL,et al.  Inverted files for text search engines , 2006, CSUR.

[5]  Winston H. Hsu,et al.  GPS, compass, or camera?: investigating effective mobile sensors for automatic search-based image annotation , 2010, ACM Multimedia.

[6]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Jiebo Luo,et al.  Geotagging in multimedia and computer vision—a survey , 2010, Multimedia Tools and Applications.

[8]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[9]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[10]  Xin Chen,et al.  City-scale landmark identification on mobile devices , 2011, CVPR 2011.

[11]  Jiebo Luo,et al.  Beyond GPS: determining the camera viewing direction of a geotagged image , 2010, ACM Multimedia.