Enriching media fragments with named entities for video classification

With the steady increase of videos published on media sharing platforms such as Dailymotion and YouTube, more and more efforts are spent to automatically annotate and organize these videos. In this paper, we propose a framework for classifying video items using both textual features such as named entities extracted from subtitles, and temporal features such as the duration of the media fragments where particular entities are spotted. We implement four automatic machine learning algorithms for multiclass classification problems, namely Logistic Regression (LG), K-Nearest Neighbour (KNN), Naive Bayes (NB) and Support Vector Machine (SVM). We study the temporal distribution patterns of named entities extracted from 805 Dailymotion videos. The results show that the best performance using the entity distribution is obtained with KNN (overall accuracy of 46.58%) while the best performance using the temporal distribution of named entities for each type is obtained with SVM (overall accuracy of 43.60%). We conclude that this approach is promising for automatically classifying online videos.

[1]  Emanuele Pianta,et al.  Revising the Wordnet Domains Hierarchy: semantics, coverage and balancing , 2004 .

[2]  Hsinchun Chen,et al.  Text‐based video content classification for online video‐sharing sites , 2010, J. Assoc. Inf. Sci. Technol..

[3]  Gary B. Wills,et al.  Synote: Weaving Media Fragments and Linked Data , 2012, LDOW.

[4]  Harald Sack,et al.  Use What You Have: Yovisto Video Search Engine Takes a Semantic Turn , 2010, SAMT.

[5]  Sotiris B. Kotsiantis,et al.  Supervised Machine Learning: A Review of Classification Techniques , 2007, Informatica.

[6]  Yang Song,et al.  Improving video classification via youtube video co-watch data , 2011, SBNMA '11.

[7]  Huan Liu Feature Selection , 2010, Encyclopedia of Machine Learning.

[8]  Bernhard Haslhofer,et al.  The LEMO annotation framework: weaving multimedia annotations with the web , 2009, International Journal on Digital Libraries.

[9]  Juan Carlos Niebles,et al.  Modeling Temporal Structure of Decomposable Motion Segments for Activity Classification , 2010, ECCV.

[10]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[11]  Keith B. Hall,et al.  Improved video categorization from text metadata and user comments , 2011, SIGIR '11.

[12]  Lora Aroyo,et al.  NoTube: making the Web part of personalised TV , 2010 .

[13]  A. Ng Feature selection, L1 vs. L2 regularization, and rotational invariance , 2004, Twenty-first international conference on Machine learning - ICML '04.

[14]  Thomas Steiner SemWebVid - Making Video a First Class Semantic Web Citizen and a First Class Web Bourgeois , 2010, ISWC Posters&Demos.

[15]  Diane J. Cook,et al.  Automatic Video Classification: A Survey of the Literature , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[16]  Raphaël Troncy,et al.  Creating Enriched YouTube Media Fragments With NERD Using Timed-Text , 2012, International Semantic Web Conference.

[17]  Raphaël Troncy,et al.  NERD: A Framework for Unifying Named Entity Recognition and Disambiguation Extraction Tools , 2012, EACL.

[18]  Stathes Hadjiefthymiades,et al.  Semantic Video Classification Based on Subtitles and Domain Terminologies , 2007, KAMC.