Querying geo-tagged videos for vision applications using spatial metadata

In this paper, we propose a novel geospatial image and video filtering tool (GIFT) to select the most relevant input images and videos for computer vision applications with geo-tagged mobile videos. GIFT tightly couples mobile media content and their geospatial metadata for fine granularity video manipulation in the spatial and temporal domain and intelligently indexes field of views (FOVs) to deal with large volumes of data. To demonstrate the effectiveness of GIFT, we introduce an end-to-end application that utilizes mobile videos to achieve persistent target tracking over large space and time. Our experimental results show promising performance of vision applications with GIFT in terms of lower communication load, improved efficiency, accuracy, and scalability when compared with baseline approaches which do not fully utilize geospatial metadata.

[1]  Erik Reinhard,et al.  Color Transfer between Images , 2001, IEEE Computer Graphics and Applications.

[2]  Dimitrios Makris,et al.  Bridging the gaps between cameras , 2004, CVPR 2004.

[3]  Cyrus Shahabi,et al.  Key Frame Selection Algorithms for Automatic Generation of Panoramic Images from Crowdsourced Geo-tagged Videos , 2014, W2GIS.

[4]  Daan Lenstra,et al.  Proceedings in SPIE , 2000 .

[5]  Changhu Wang,et al.  Photo2Trip: generating travel routes from geo-tagged photos for trip planning , 2010, ACM Multimedia.

[6]  Andrew Gilbert,et al.  Tracking Objects Across Cameras by Incrementally Learning Inter-camera Colour Calibration and Patterns of Activity , 2006, ECCV.

[7]  W. Eric L. Grimson,et al.  Inference of non-overlapping camera network topology by measuring statistical dependence , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[8]  Cyrus Shahabi,et al.  MediaQ: mobile multimedia management system , 2014, MMSys '14.

[9]  Mubarak Shah,et al.  Tracking across multiple cameras with disjoint views , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[10]  Luis Salgado,et al.  Real-time robust estimation of vanishing points through nonlinear optimization , 2010, Photonics Europe.

[11]  Azer Bestavros In Proceedings of the ACM/IASTED/ISMM Conference on Distributed Multimedia Systems and Applications, Stanford, CA, Aug 1995. Demand based Data Dissemination in Distributed Multimedia Systems , 1995 .

[12]  Roger Zimmermann,et al.  Generating synthetic meta-data for georeferenced video management , 2010, GIS '10.

[13]  Cyrus Shahabi,et al.  Efficient indexing and retrieval of large-scale geo-tagged video databases , 2016, GeoInformatica.

[14]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[15]  Wolfgang Effelsberg,et al.  Navigating videos by location , 2013, MoVid '13.

[16]  J. Blat,et al.  VideoGIS: Segmenting and indexing video based on geographic information , 2002 .

[17]  Ramakant Nevatia,et al.  High performance object detection by collaborative learning of Joint Ranking of Granules features , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[18]  Roger Zimmermann,et al.  Viewable scene modeling for geospatial video search , 2008, ACM Multimedia.

[19]  Jiebo Luo,et al.  Estimating the camera direction of a geotagged image using reference images , 2014, Pattern Recognit..

[20]  Cyrus Shahabi,et al.  An efficient index structure for large-scale geo-tagged video databases , 2014, SIGSPATIAL/GIS.

[21]  Byunggu Yu,et al.  Vector model in support of versatile georeferenced video search , 2010, MMSys '10.

[22]  Mubarak Shah,et al.  Appearance modeling for tracking in multiple non-overlapping cameras , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[23]  Min-Chun Hu,et al.  Photo sundial: Estimating the time of capture in consumer photos , 2016, Neurocomputing.

[24]  Gérard G. Medioni,et al.  Persistent People Tracking and Face Capture over a Wide Area , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[25]  Gérard G. Medioni,et al.  Exploring context information for inter-camera multiple target tracking , 2014, IEEE Winter Conference on Applications of Computer Vision.

[26]  Ravi Krishnamurthy,et al.  The Multilevel Grid File - A Dynamic Hierarchical Multidimensional File Structure , 1991, DASFAA.

[27]  Roger Zimmermann,et al.  Relevance ranking in georeferenced video search , 2009, Multimedia Systems.

[28]  Gal Ashour,et al.  Efficient storage and retrieval of geo-referenced video from moving sensors , 2013, SIGSPATIAL/GIS.

[29]  Michael Arens,et al.  View-invariant person re-identification with an Implicit Shape Model , 2011, 2011 8th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[30]  Kentaro Toyama,et al.  Geographic location tags on digital images , 2003, ACM Multimedia.

[31]  David S. Munro,et al.  Topology Estimation for Thousand-Camera Surveillance Networks , 2007, 2007 First ACM/IEEE International Conference on Distributed Smart Cameras.

[32]  Alexei A. Efros,et al.  Putting Objects in Perspective , 2006, CVPR.

[33]  Jon M. Kleinberg,et al.  Mapping the world's photos , 2009, WWW '09.

[34]  Jinha Kim,et al.  GeoTree: Using spatial information for georeferenced video search , 2014, Knowl. Based Syst..

[35]  Gérard G. Medioni,et al.  Context tracker: Exploring supporters and distracters in unconstrained environments , 2011, CVPR 2011.

[36]  Woong-Kee Loh,et al.  GeoVideoIndex: Indexing for georeferenced videos , 2016, Inf. Sci..

[37]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[38]  He Ma,et al.  Large-scale geo-tagged video indexing and queries , 2014, GeoInformatica.