KISS: Knowing Camera Prototype System for Recognizing and Annotating Places-of-Interest

This paper presents a project called KnowIng camera prototype SyStem (KISS) for real-time places-of-interest (POI) recognition and annotation for smartphone photos, with the availability of online geotagged images for POIs as our knowledge base. We propose a “Spatial+Visual” (S+V) framework which consists of a probabilistic field-of-view (pFOV) model in the spatial phase and sparse coding similarity metric in the visual phase to recognize phone-captured POIs. Moreover, we put forward an offline Collaborative Salient Area (COSTAR) mining algorithm to detect common visual features (called Costars) among the noisy photos geotagged on each POI, thus to clean the geotagged image database. The mining result can be utilized to annotate the region-of-interest on the query image during the online query processing. Besides, this mining procedure also improves the efficiency and accuracy of the S+V framework. Furthermore, we extend the pFOV model into a Bayesian FOV(<inline-formula><tex-math notation="LaTeX"> $\beta$</tex-math><alternatives><inline-graphic xlink:type="simple" xlink:href="peng-ieq1-2489647.gif"/></alternatives></inline-formula> FOV) model which improves the spatial recognition accuracy by more than 30 percent and also further alleviates visual computation. From a bayesian point of view, the likelihood of a certain POI being captured by phones is a prior probability in pFOV model which is represented as a posterior probability in <inline-formula> <tex-math notation="LaTeX">$\beta$</tex-math><alternatives><inline-graphic xlink:type="simple" xlink:href="peng-ieq2-2489647.gif"/> </alternatives></inline-formula>FOV model.Our experiments in the real-world and Oxford 5K datasets show promising recognition results. In order to provide a fine-grained annotation ground truth, we labeled a new dataset based on Oxford 5K and make it public available on the web. Our COSTAR mining techniqueoutperforms state-of-the-art approach on both dataset.

[1]  Ying Zhang,et al.  Camera Shooting Location Recommendations for Landmarks in Geo-space , 2013, 2013 IEEE 21st International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems.

[2]  Panu Turcot,et al.  Better matching with fewer features: The selection of useful features in large database recognition problems , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[3]  Cyrus Shahabi,et al.  An efficient index structure for large-scale geo-tagged video databases , 2014, SIGSPATIAL/GIS.

[4]  S. Govindarajulu,et al.  A Comparison of SIFT, PCA-SIFT and SURF , 2012 .

[5]  Luo Juan,et al.  A comparison of SIFT, PCA-SIFT and SURF , 2009 .

[6]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[7]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[8]  Jan-Michael Frahm,et al.  Modeling and Recognition of Landmark Image Collections Using Iconic Scene Graphs , 2008, International Journal of Computer Vision.

[9]  Yang Song,et al.  Tour the world: Building a web-scale landmark recognition engine , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  He Ma,et al.  Large-scale geo-tagged video indexing and queries , 2014, GeoInformatica.

[11]  Michael Isard,et al.  Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[12]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[13]  Yang Song,et al.  Tour the world: a technical demonstration of a web-scale landmark recognition engine , 2009, ACM Multimedia.

[14]  Xin Chen,et al.  City-scale landmark identification on mobile devices , 2011, CVPR 2011.

[15]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[16]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[17]  Yi Liu,et al.  Global contrast of superpixels based salient region detection , 2012, CVM'12.

[18]  Gang Chen,et al.  The knowing camera 2: recognizing and annotating places-of-interest in smartphone photos , 2014, SIGIR.

[19]  Yannis Avrithis,et al.  Retrieving landmark and non-landmark images from community photo collections , 2010, ACM Multimedia.

[20]  Roger Zimmermann,et al.  Relevance ranking in georeferenced video search , 2009, Multimedia Systems.

[21]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Michael Isard,et al.  Lost in quantization: Improving particular object retrieval in large scale image databases , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Nanning Zheng,et al.  Learning to Detect a Salient Object , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Meng Wang,et al.  Harvesting visual concepts for image search with complex queries , 2012, ACM Multimedia.

[25]  Jiri Matas,et al.  Total recall II: Query expansion revisited , 2011, CVPR 2011.

[26]  Cordelia Schmid,et al.  Improving Bag-of-Features for Large Scale Image Search , 2010, International Journal of Computer Vision.

[27]  Sabine Süsstrunk,et al.  Salient Region Detection and Segmentation , 2008, ICVS.

[28]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[29]  Cyrus Shahabi,et al.  MediaQ: mobile multimedia management system , 2014, MMSys '14.