Automatic tagging and geotagging in video collections and communities

Automatically generated tags and geotags hold great promise to improve access to video collections and online communities. We overview three tasks offered in the MediaEval 2010 benchmarking initiative, for each, describing its use scenario, definition and the data set released. For each task, a reference algorithm is presented that was used within MediaEval 2010 and comments are included on lessons learned. The Tagging Task, Professional involves automatically matching episodes in a collection of Dutch television with subject labels drawn from the keyword thesaurus used by the archive staff. The Tagging Task, Wild Wild Web involves automatically predicting the tags that are assigned by users to their online videos. Finally, the Placing Task requires automatically assigning geo-coordinates to videos. The specification of each task admits the use of the full range of available information including user-generated metadata, speech recognition transcripts, audio, and visual features.

[1]  Ebroul Izquierdo,et al.  QMUL @ MediaEval 2010 Tagging Task : Semantic Query Expansion for Predicting User Tags , 2010 .

[2]  Gareth J. F. Jones,et al.  DCU at VideoCLEF 2008 , 2008, CLEF.

[3]  Jon M. Kleinberg,et al.  Mapping the world's photos , 2009, WWW '09.

[4]  Hector Garcia-Molina,et al.  Social tag prediction , 2008, SIGIR '08.

[5]  Chloé Clavel,et al.  Impact of spontaneous speech features on business concept detection: a study of call-centre data. , 2010, SSCS '10.

[6]  Franciska de Jong,et al.  Annotation of Heterogeneous Multimedia Content Using Automatic Speech Recognition , 2007, SAMT.

[7]  Martha Larson,et al.  SVM Classification Using Sequences of Phonemes and Syllables , 2002, PKDD.

[8]  Yi-fang Brook Wu,et al.  Domain-specific keyphrase extraction , 2005, CIKM '05.

[9]  Douglas W. Oard,et al.  Improving text classification for oral history archives with temporal domain knowledge , 2007, SIGIR.

[10]  Susan T. Dumais,et al.  Inductive learning algorithms and representations for text categorization , 1998, CIKM '98.

[11]  Ryen W. White,et al.  Overview of the CLEF-2006 Cross-Language Speech Retrieval Track , 2006, CLEF.

[12]  Lars Schmidt-Thieme,et al.  Pairwise interaction tensor factorization for personalized tag recommendation , 2010, WSDM '10.

[13]  Christian Wartena Using a Divergence Model for MediaEval's Tagging Task (Professional Version) , 2010 .

[14]  Kyumin Lee,et al.  You are where you tweet: a content-based approach to geo-locating twitter users , 2010, CIKM.

[15]  Paul Over,et al.  Evaluation campaigns and TRECVid , 2006, MIR '06.

[16]  John D. Lafferty,et al.  Model-based feedback in the language modeling approach to information retrieval , 2001, CIKM '01.

[17]  Richard P. Lippmann,et al.  Techniques for information retrieval from voice messages , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[18]  Pavel Serdyukov,et al.  Placing flickr photos on a map , 2009, SIGIR.

[19]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[20]  Alexei A. Efros,et al.  IM2GPS: estimating geographic information from a single image , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Roelof van Zwol,et al.  Flickr tag recommendation based on collective knowledge , 2008, WWW.

[22]  David Hernández-Aranda,et al.  UNED at MediaEval 2010: exploiting text metadata for Automatic Video Tagging , 2010 .

[23]  Alberto Messina,et al.  Parallel neural networks for multimodal video genre classification , 2008, Multimedia Tools and Applications.

[24]  Thomas Sikora,et al.  Feature-based video key frame extraction for low quality video sequences , 2009, 2009 10th Workshop on Image Analysis for Multimedia Interactive Services.

[25]  Philippe Mulhem,et al.  LIG at ImageCLEF 2008, Evaluating Systems for Multilingual and Multimodal Information Access , 2008 .

[26]  Mor Naaman,et al.  Methods for extracting place semantics from Flickr tags , 2009, TWEB.

[27]  Steven Schockaert,et al.  Finding locations of flickr resources using language models and similarity search , 2011, ICMR.

[28]  Christian Wartena,et al.  Topic Detection by Clustering Keywords , 2008, 2008 19th International Workshop on Database and Expert Systems Applications.

[29]  Andreas Stolcke,et al.  Dialogue act modeling for automatic tagging and recognition of conversational speech , 2000, CL.

[30]  Véronique Malaisé,et al.  Anchoring Dutch Cultural Heritage Thesauri to WordNet: Two Case Studies , 2007, LaTeCH@ACL 2007.

[31]  Christian Wartena,et al.  Keyword Extraction Using Word Co-occurrence , 2010, 2010 Workshops on Database and Expert Systems Applications.

[32]  Gareth Jones,et al.  DCU at MediaEval 2010 - Tagging Task WildWildWeb , 2010 .

[33]  Ron Sivan,et al.  Web-a-where: geotagging web content , 2004, SIGIR '04.

[34]  Ellen M. Voorhees,et al.  The TREC Spoken Document Retrieval Track: A Success Story , 2000, TREC.

[35]  Thomas Sikora,et al.  Multi-modal, multi-resource methods for placing Flickr videos on the map , 2011, ICMR.

[36]  Gerald Friedland,et al.  The 2010 ICSI Video Location Estimation System , 2010 .

[37]  Jean-Luc Gauvain,et al.  Speech Processing for Audio Indexing , 2008, GoTAL.

[38]  James Allan,et al.  Topic detection and tracking: event-based information organization , 2002 .