Point of Interest Detection and Visual Distance Estimation for Sensor-Rich Video

Due to technological advances and the popularity of camera sensors, it is now straightforward for users to capture and share videos. A large number of geo-tagged photos and videos have been accumulating continuously on the web, posing a challenging problem for mining this type of media data. In one application scenario, users might desire to know what the Points of Interest (POI) are which contain important objects or places in a video. Existing solutions attempt to examine the content of the videos and recognize objects and events. This is typically time-consuming and computationally expensive and the results can be uneven. Therefore these methods face challenges when applied to large video repositories. We propose a novel technique that leverages sensor-generated meta-data (camera locations and viewing directions) which are automatically acquired as continuous streams together with the video frames. Existing smartphones can easily accommodate such integrated recording tasks. By considering a collective set of videos and leveraging the acquired auxiliary meta-data, our approach is able to detect interesting regions and objects (POIs) and their distances from the camera positions in a fully automated way. Because of its computational efficiency, the proposed method scales well and our experiments show very promising results.

[1]  Kentaro Toyama,et al.  Project Lachesis: Parsing and Modeling Location Histories , 2004, GIScience.

[2]  Changhu Wang,et al.  Photo2Trip: generating travel routes from geo-tagged photos for trip planning , 2010, ACM Multimedia.

[3]  Carlo Torniai,et al.  Sharing, Discovering and Browsing Geotagged Pictures on the World Wide Web , 2007, The Geospatial Web.

[4]  Masanori Sugimoto,et al.  An Outdoor Recommendation System based on User Location History , 2005, ubiPCMM.

[5]  Jiajie Xu,et al.  Discovering hot topics from geo-tagged video , 2013, Neurocomputing.

[6]  Kentaro Toyama,et al.  Geographic location tags on digital images , 2003, ACM Multimedia.

[7]  Jia Hao,et al.  Sensor-rich video exploration on a map interface , 2011, MM '11.

[8]  Yang Song,et al.  Tour the world: Building a web-scale landmark recognition engine , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Yonatan Wexler,et al.  Hierarchical photo organization using geo-relevance , 2007, GIS.

[10]  Roger Zimmermann,et al.  Orientation data correction with georeferenced mobile videos , 2013, SIGSPATIAL/GIS.

[11]  Rita Cucchiara,et al.  Estimating Geospatial Trajectory of a Moving Camera , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[12]  H. Garcia-Molina,et al.  Automatic organization for digital photographs with geographic coordinates , 2004, Proceedings of the 2004 Joint ACM/IEEE Conference on Digital Libraries, 2004..

[13]  C. H. Graham,et al.  Vision and visual perception , 1965 .

[14]  Mikolaj Morzy,et al.  Mining Frequent Trajectories of Moving Objects for Location Prediction , 2007, MLDM.

[15]  Dino Pedreschi,et al.  Trajectory pattern mining , 2007, KDD '07.

[16]  David A. Forsyth,et al.  Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary , 2002, ECCV.

[17]  Xing Xie,et al.  Mining interesting locations and travel sequences from GPS trajectories , 2009, WWW '09.

[18]  Xing Xie,et al.  Mining user similarity based on location history , 2008, GIS '08.

[19]  Takahiro Hara,et al.  Mining people's trips from large scale geo-tagged photos , 2010, ACM Multimedia.

[20]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[21]  Roger Zimmermann,et al.  OSCOR: an orientation sensor data correction system for mobile generated contents , 2013, MM '13.

[22]  Özgür Ulusoy,et al.  A data mining approach for location prediction in mobile environments , 2005, Data Knowl. Eng..

[23]  Jia Hao,et al.  Keyframe presentation for browsing of user-generated videos on map interfaces , 2011, MM '11.

[24]  Kerry Rodden,et al.  How do people manage their digital photographs? , 2003, CHI '03.

[25]  Marc Gelgon,et al.  Building and tracking hierarchical geographical & temporal partitions for image collection management on mobile devices , 2005, MULTIMEDIA '05.

[26]  Alexei A. Efros,et al.  Discovering objects and their location in images , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[27]  Mor Naaman,et al.  Generating diverse and representative image search results for landmarks , 2008, WWW.

[28]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[29]  Mor Naaman,et al.  How flickr helps us make sense of the world: context and content in community-contributed media collections , 2007, ACM Multimedia.

[30]  Roger Zimmermann,et al.  Viewable scene modeling for geospatial video search , 2008, ACM Multimedia.

[31]  Xing Xie,et al.  Mining city landmarks from blogs by graph modeling , 2009, ACM Multimedia.

[32]  Sung-Bae Cho,et al.  Location-Based Recommendation System Using Bayesian User's Preference Model in Mobile Devices , 2007, UIC.

[33]  Roger Zimmermann,et al.  Generating synthetic meta-data for georeferenced video management , 2010, GIS '10.