Retrieving geo-location of videos with a divide & conquer hierarchical multimodal approach

This paper presents a strategy to identify the geographic location of videos. First, it relies on a multi-modal cascade pipeline that exploits the available sources of information, namely the user's upload history, his social network and a visual-based matching technique. Second, we present a novel divide & conquer strategy to better exploit the tags associated with the input video. It pre-selects one or several geographic area of interest of higher expected relevance and performs a deeper analysis inside the selected area(s) to return the coordinates most likely to be related to the input tags. The experiments were conducted as part of the MediaEval 2012 Placing Task. Our approach, which differs significantly from the other submitted techniques, achieves the best results on this benchmark when considering the same amount of external information, i.e. when not using any gazetteers nor any other kind of external information.

[1]  Jon M. Kleinberg,et al.  Mapping the world's photos , 2009, WWW '09.

[2]  Cordelia Schmid,et al.  Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Pavel Serdyukov,et al.  Placing flickr photos on a map , 2009, SIGIR.

[4]  Alexei A. Efros,et al.  IM2GPS: estimating geographic information from a single image , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Steven Schockaert,et al.  Ghent and Cardiff University at the 2012 Placing Task , 2012, MediaEval.

[6]  Charles L. A. Clarke,et al.  Improving document clustering using Okapi BM25 feature weighting , 2011, Information Retrieval.

[7]  Jurandy Almeida,et al.  A visual approach for video geocoding using bag-of-scenes , 2012, ICMR.

[8]  Vanessa Murdock,et al.  Modeling locations with social media , 2013, Information Retrieval.

[9]  Roelof van Zwol,et al.  Flickr tag recommendation based on collective knowledge , 2008, WWW.

[10]  Jaeyoung Choi,et al.  The 2012 ICSI/Berkeley Video Location Estimation System , 2012, MediaEval.

[11]  Cordelia Schmid,et al.  Aggregating Local Image Descriptors into Compact Codes , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[13]  Adam Rae,et al.  Working Notes for the Placing Task at MediaEval 2011 , 2011, MediaEval.

[14]  Jurandy Almeida,et al.  A Multimodal Approach for Video Geocoding , 2012, MediaEval.

[15]  Bart Thomee,et al.  Working Notes for the Placing Task at MediaEval 2013 , 2013, MediaEval.

[16]  Hervé Jégou,et al.  Negative Evidences and Co-occurences in Image Retrieval: The Benefit of PCA and Whitening , 2012, ECCV.

[17]  Patrick Gros,et al.  Hamming embedding similarity-based image classification , 2012, ICMR.

[18]  Harald Kosch,et al.  Geo-based automatic image annotation , 2012, ICMR '12.

[19]  Adrian Popescu CEA LIST's Participation at MediaEval 2013 Placing Task , 2013, MediaEval.

[20]  Jiebo Luo,et al.  Geotagging in multimedia and computer vision—a survey , 2010, Multimedia Tools and Applications.

[21]  Thomas Sikora,et al.  How Spatial Segmentation improves the Multimodal Geo-Tagging , 2012, MediaEval.