Text‐based video content classification for online video‐sharing sites

With the emergence of Web 2.0, sharing personal content, communicating ideas, and interacting with other online users in Web 2.0 communities have become daily routines for online users. User‐generated data from Web 2.0 sites provide rich personal information (e.g., personal preferences and interests) and can be utilized to obtain insight about cyber communities and their social networks. Many studies have focused on leveraging user‐generated information to analyze blogs and forums, but few studies have applied this approach to video‐sharing Web sites. In this study, we propose a text‐based framework for video content classification of online‐video sharing Web sites. Different types of user‐generated data (e.g., titles, descriptions, and comments) were used as proxies for online videos, and three types of text features (lexical, syntactic, and content‐specific features) were extracted. Three feature‐based classification techniques (C4.5, Naïve Bayes, and Support Vector Machine) were used to classify videos. To evaluate the proposed framework, user‐generated data from candidate videos, which were identified by searching user‐given keywords on YouTube, were first collected. Then, a subset of the collected data was randomly selected and manually tagged by users as our experiment data. The experimental results showed that the proposed approach was able to classify online videos based on users' interests with accuracy rates up to 87.2%, and all three types of text features contributed to discriminating videos. Support Vector Machine outperformed C4.5 and Naïve Bayes techniques in our experiments. In addition, our case study further demonstrated that accurate video‐classification results are very useful for identifying implicit cyber communities on video‐sharing Web sites.

[1]  Alberto Messina,et al.  Automatic Genre Classification of TV Programmes Using Gaussian Mixture Models and Neural Networks , 2007, 18th International Workshop on Database and Expert Systems Applications (DEXA 2007).

[2]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[3]  Gary Geisler,et al.  Tagging video: conventions and strategies of the YouTube community , 2007, JCDL '07.

[4]  Hsinchun Chen,et al.  Applying authorship analysis to extremist-group Web forum messages , 2005, IEEE Intelligent Systems.

[5]  Ashok Samal,et al.  Automatic recognition and analysis of human faces and facial expressions: a survey , 1992, Pattern Recognit..

[6]  Mehran Sahami,et al.  Learning Limited Dependence Bayesian Classifiers , 1996, KDD.

[7]  Shlomo Argamon,et al.  Style mining of electronic messages for multiple authorship discrimination: first results , 2003, KDD '03.

[8]  Yongmin Li,et al.  Video classification using spatial-temporal features and PCA , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[9]  R. Harald Baayen,et al.  How Variable May a Constant be? Measures of Lexical Richness in Perspective , 1998, Comput. Humanit..

[10]  Yaser Sheikh,et al.  Semantic Film Preview Classification Using Low-Level Computable Features , 2003 .

[11]  Liliane dos Santos Machado,et al.  Gaussian Mixture Models for Supervised Classification of Remote Sensing Multispectral Images , 2003, CIARP.

[12]  Schubert Foo,et al.  Perspectives on social tagging , 2009, J. Assoc. Inf. Sci. Technol..

[13]  J. Schafer Spinning the web of hate : web-based hate propagation by extremist organizations , 2002 .

[14]  Olivier de Vel,et al.  Mining E-mail Authorship , 2000 .

[15]  Denis Pellerin,et al.  Video classification based on low-level feature fusion model , 2005, 2005 13th European Signal Processing Conference.

[16]  Christos Faloutsos,et al.  VideoCube: A Novel Tool for Video Mining and Classification , 2002, ICADL.

[17]  Thomas Merriam,et al.  Shakespeare, Fletcher, and the Two Noble Kinsmen , 1994 .

[18]  Claire Cardie,et al.  An Analysis of Statistical and Syntactic Phrases , 1997, RIAO.

[19]  Jörg Kindermann,et al.  Authorship Attribution with Support Vector Machines , 2003, Applied Intelligence.

[20]  Rong Zheng,et al.  A framework for authorship identification of online messages: Writing-style features and classification techniques , 2006, J. Assoc. Inf. Sci. Technol..

[21]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[22]  David I. Holmes,et al.  Feature-Finding for Text Classification , 1996 .

[23]  David D. Lewis,et al.  Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval , 1998, ECML.

[24]  Zhu Liu,et al.  Integration of multimodal features for video scene classification based on HMM , 1999, 1999 IEEE Third Workshop on Multimedia Signal Processing (Cat. No.99TH8451).

[25]  Moshe Koppel,et al.  Exploiting Stylistic Idiosyncrasies for Authorship Attribution , 2003 .

[26]  Dong Xu,et al.  Video Event Recognition Using Kernel Methods with Multilevel Temporal Alignment , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  HongJiang Zhang,et al.  Motion Pattern-Based Video Classification and Retrieval , 2003, EURASIP J. Adv. Signal Process..

[28]  Shlomo Argamon,et al.  Computational methods in authorship attribution , 2009, J. Assoc. Inf. Sci. Technol..

[29]  Noel E. O'Connor,et al.  Learning Midlevel Image Features for Natural Scene and Texture Classification , 2007, IEEE Transactions on Circuits and Systems for Video Technology.

[30]  Cheng Lu,et al.  Classification of summarized videos using hidden markov models on compressed chromaticity signatures , 2001, MULTIMEDIA '01.

[31]  Andreas Girgensohn,et al.  Video classification using transform coefficients , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[32]  Stefan Eickeler,et al.  Content-based video indexing of TV broadcast news using hidden Markov models , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[33]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[34]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[35]  Jennifer Jie Xu,et al.  Mining communities and their relationships in blogs: A study of online hate groups , 2007, Int. J. Hum. Comput. Stud..

[36]  Tim O'Reilly,et al.  What is Web 2.0: Design Patterns and Business Models for the Next Generation of Software , 2007 .

[37]  Thomas G. Dietterich,et al.  A Comparative Study of ID3 and Backpropagation for English Text-to-Speech Mapping , 1990, ML.

[38]  Wei-Hao Lin,et al.  News video classification using SVM-based multimodal classifiers and combination strategies , 2002, MULTIMEDIA '02.

[39]  Jay F. Nunamaker,et al.  Stylometric Identification in Electronic Markets: Scalability and Robustness , 2008, J. Manag. Inf. Syst..

[40]  Stephen W. Smoliar,et al.  Content based video indexing and retrieval , 1994, IEEE MultiMedia.

[41]  C.-C. Jay Kuo,et al.  Rule-based video classification system for basketball video indexing , 2000, MULTIMEDIA '00.

[42]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[43]  Wojciech Pieczynski,et al.  Estimation of generalized mixture in the case of correlated sensors , 2000, IEEE Trans. Image Process..

[44]  Mike Y. Chen,et al.  Yahoo! for Amazon: Sentiment Extraction from Small Talk on the Web , 2007, Manag. Sci..

[45]  J. Movellan Tutorial on Hidden Markov Models , 2006 .

[46]  Hong-Jiang Zhang,et al.  An efficient and effective region-based image retrieval framework , 2004, IEEE Transactions on Image Processing.

[47]  Hsinchun Chen,et al.  Cyber extremism in Web 2.0: An exploratory study of international Jihadist groups , 2008, 2008 IEEE International Conference on Intelligence and Security Informatics.

[48]  Hsinchun Chen,et al.  A Machine Learning Approach to Inductive Query by Examples: An Experiment Using Relevance Feedback, ID3, Genetic Algorithms, and Simulated Annealing , 1998, J. Am. Soc. Inf. Sci..

[49]  Wolfgang Effelsberg,et al.  Automatic recognition of film genres , 1995, MULTIMEDIA '95.

[50]  Chabane Djeraba Content-based multimedia indexing and retrieval , 2002, IEEE MultiMedia.

[51]  John R. Smith,et al.  A multi-modal system for the retrieval of semantic video events , 2004, Comput. Vis. Image Underst..

[52]  Gang Wei,et al.  Video classification based on HMM using text and faces , 2000, 2000 10th European Signal Processing Conference.

[53]  Alain Hillion,et al.  Estimation of fuzzy Gaussian mixture and unsupervised statistical image segmentation , 1997, IEEE Trans. Image Process..

[54]  Rama Chellappa,et al.  Human and machine recognition of faces: a survey , 1995, Proc. IEEE.

[55]  H. van Halteren,et al.  Outside the cave of shadows: using syntactic annotation to enhance authorship attribution , 1996 .

[56]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[57]  Hsinchun Chen,et al.  Writeprints: A stylometric approach to identity-level identification and similarity detection in cyberspace , 2008, TOIS.

[58]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[59]  Hsinchun Chen,et al.  Sentiment analysis in multiple languages: Feature selection for opinion classification in Web forums , 2008, TOIS.

[60]  Nicu Sebe,et al.  Content-based multimedia information retrieval: State of the art and challenges , 2006, TOMCCAP.

[61]  Qi Tian,et al.  A mid-level representation framework for semantic sports video analysis , 2003, ACM Multimedia.

[62]  David S. Doermann,et al.  Sports video classification using HMMS , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[63]  H. T. Eddy The characteristic curves of composition. , 1887, Science.

[64]  Shih-Fu Chang,et al.  Visual Cue Cluster Construction via Information Bottleneck Principle and Kernel Density Estimation , 2005, CIVR.

[65]  Graeme Hirst,et al.  Bigrams of Syntactic Labels for Authorship Discrimination of Short Texts , 2007, Lit. Linguistic Comput..

[66]  Cordelia Schmid,et al.  Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[67]  Nuno Vasconcelos,et al.  Statistical models of video structure for content analysis and characterization , 2000, IEEE Trans. Image Process..

[68]  Jiebo Luo,et al.  Automatic image orientation detection via confidence-based integration of low-level and semantic cues , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[69]  Shlomo Argamon,et al.  Choosing the Right Bigrams for Information Retrieval , 2004 .

[70]  Ravi Kumar,et al.  Trawling the Web for Emerging Cyber-Communities , 1999, Comput. Networks.

[71]  V. Burris,et al.  White Supremacist Networks on the Internet , 2000 .

[72]  D. Garrison,et al.  Methodological Issues in the Content Analysis of Computer Conference Transcripts , 2007 .

[73]  Hsinchun Chen,et al.  Multimedia Content Coding and Analysis: Unraveling the Content of Jihadi Extremist Groups' Videos , 2008 .

[74]  Alberto Messina,et al.  Characterizing Multimedia Objects through Multimodal Content Analysis and Fuzzy Fingerprints , 2009, SITIS.

[75]  Mao-Hsiung Hung,et al.  Rule-based Event Detection of Broadcast Baseball Videos Using Mid-level Cues , 2007, Second International Conference on Innovative Computing, Informatio and Control (ICICIC 2007).