Blip10000: a social video dataset containing SPUG content for tagging and retrieval

The increasing amount of digital multimedia content available is inspiring potential new types of user interaction with video data. Users want to easily find the content by searching and browsing. For this reason, techniques are needed that allow automatic categorisation, searching the content and linking to related information. In this work, we present a dataset that contains comprehensive semi-professional user-generated (SPUG) content, including audiovisual content, user-contributed metadata, automatic speech recognition transcripts, automatic shot boundary files, and social information for multiple 'social levels'. We describe the principal characteristics of this dataset and present results that have been achieved on different tasks.

[1]  Mohammad Soleymani,et al.  Automatic tagging and geotagging in video collections and communities , 2011, ICMR.

[2]  Jurandy Almeida,et al.  UNICAMP-UFMG at MediaEval 2012: Genre Tagging Task , 2012, MediaEval.

[3]  Xian-Sheng Hua,et al.  MSRA-MM: Bridging Research and Industrial Societies for Multimedia Information Retrieval , 2009 .

[4]  Mor Naaman,et al.  Social multimedia: highlighting opportunities for search and mining of multimedia data in social media applications , 2010, Multimedia Tools and Applications.

[5]  Yangyang Shi,et al.  TUD at MediaEval 2012 genre tagging task: Multi-modality video categorization with one-vs-all classifiers , 2012, MediaEval.

[6]  Paul Deléglise,et al.  LIUM's systems for the IWSLT 2011 speech translation tasks , 2011, IWSLT.

[7]  Thomas Sikora,et al.  TUB @ MediaEval 2012 Tagging Task: Feature Selection Methods for Bag-of-(visual)-Words Approaches , 2012, MediaEval.

[8]  Jean-Luc Gauvain,et al.  Speech Processing for Audio Indexing , 2008, GoTAL.

[9]  Shih-Fu Chang,et al.  Consumer video understanding: a benchmark database and an evaluation of human and machine performance , 2011, ICMR.

[10]  Rainer Stiefelhagen,et al.  KIT at MediaEval 2012 - Content - based Genre Classification with Visual Cues , 2012, MediaEval.

[11]  Horia Cucu,et al.  ARF @ MediaEval 2012: Multimodal Video Classification , 2012, MediaEval.

[12]  Chong-Wah Ngo,et al.  Practical elimination of near-duplicates from web video search , 2007, ACM Multimedia.

[13]  Mubarak Shah,et al.  Recognizing 50 human action categories of web videos , 2012, Machine Vision and Applications.

[14]  Georges Quénot,et al.  TRECVID 2015 - An Overview of the Goals, Tasks, Data, Evaluation Mechanisms and Metrics , 2011, TRECVID.

[15]  Martha Larson,et al.  Search and Hyperlinking Task at MediaEval 2012 , 2012, MediaEval.

[16]  Sebastian Schmiedeke,et al.  Overview of the MediaEval 2012 Tagging Task , 2012, MediaEval.

[17]  Yangyang Shi,et al.  MediaEval 2012 Tagging Task: Prediction based on One Best List and Confusion Networks , 2012, MediaEval.

[18]  Thomas Sikora,et al.  Feature-based video key frame extraction for low quality video sequences , 2009, 2009 10th Workshop on Image Analysis for Multimedia Interactive Services.

[19]  Pietro Perona,et al.  A walk through the web’s video clips , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.