Parallel neural networks for multimodal video genre classification

Improvements in digital technology have made possible the production and distribution of huge quantities of digital multimedia data. Tools for high-level multimedia documentation are becoming indispensable to efficiently access and retrieve desired content from such data. In this context, automatic genre classification provides a simple and effective solution to describe multimedia contents in a structured and well understandable way. We propose in this article a methodology for classifying the genre of television programmes. Features are extracted from four informative sources, which include visual-perceptual information (colour, texture and motion), structural information (shot length, shot distribution, shot rhythm, shot clusters duration and saturation), cognitive information (face properties, such as number, positions and dimensions) and aural information (transcribed text, sound characteristics). These features are used for training a parallel neural network system able to distinguish between seven video genres: football, cartoons, music, weather forecast, newscast, talk show and commercials. Experiments conducted on more than 100 h of audiovisual material confirm the effectiveness of the proposed method, which reaches a classification accuracy rate of 95%.

[1]  C. Tomasi Estimating Gaussian Mixture Densities with EM – A Tutorial , 2004 .

[2]  Rama Chellappa,et al.  Human and machine recognition of faces: a survey , 1995, Proc. IEEE.

[3]  Akihisa Kodate,et al.  Sports video categorizing method using camera motion parameters , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[4]  Joseph M. Boggs The Art of Watching Films , 1978 .

[5]  B. Satterwhite,et al.  Automatic detection of TV commercials , 2004, IEEE Potentials.

[6]  Nuno Vasconcelos,et al.  Statistical models of video structure for content analysis and characterization , 2000, IEEE Trans. Image Process..

[7]  Janko Calic,et al.  A rule-based video annotation system , 2004, IEEE Transactions on Circuits and Systems for Video Technology.

[8]  Alberto Messina,et al.  Fuzzy mining of multimedia genre applied to television archives , 2008, 2008 IEEE International Conference on Multimedia and Expo.

[9]  Hideyuki Tamura,et al.  Textural Features Corresponding to Visual Perception , 1978, IEEE Transactions on Systems, Man, and Cybernetics.

[10]  Jordi Vitrià,et al.  Local Color Analysis for Scene Break Detection Applied to TV Commercials Recognition , 1999, VISUAL.

[11]  Marcel Worring,et al.  Multimodal Video Indexing : A Review of the State-ofthe-art , 2001 .

[12]  Shumeet Baluja,et al.  Advertisement Detection and Replacement using Acoustic and Visual Repetition , 2006, 2006 IEEE Workshop on Multimedia Signal Processing.

[13]  Tao Mei,et al.  Automatic Video Genre Categorization using Hierarchical SVM , 2006, 2006 International Conference on Image Processing.

[14]  Zhu Liu,et al.  Classification TV programs based on audio information using hidden Markov model , 1998, 1998 IEEE Second Workshop on Multimedia Signal Processing (Cat. No.98EX175).

[15]  Edward J. Delp,et al.  Distribution of shot lengths for video analysis , 2001, IS&T/SPIE Electronic Imaging.

[16]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[17]  Qinbao Song,et al.  Automatic video classification using decision tree method , 2002, Proceedings. International Conference on Machine Learning and Cybernetics.

[18]  Steven Barker,et al.  IEEE international Conference on Information Technology , 2004 .

[19]  Janko Calie,et al.  Highly efficient low-level feature extraction for video representation and retrieval , 2004 .

[20]  Wolfgang Effelsberg,et al.  Automatic recognition of film genres , 1995, MULTIMEDIA '95.

[21]  Yongmin Li,et al.  Video classification using spatial-temporal features and PCA , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[22]  Alberto Messina,et al.  Multimodal Genre Analysis Applied to Digital Television Archives , 2008, 2008 19th International Workshop on Database and Expert Systems Applications.

[23]  A. Murat Tekalp,et al.  Digital Video Processing , 1995 .

[24]  R. Polikar,et al.  Ensemble based systems in decision making , 2006, IEEE Circuits and Systems Magazine.

[25]  Gang Wei,et al.  Video classification based on HMM using text and faces , 2000, 2000 10th European Signal Processing Conference.

[26]  Antonio Albiol,et al.  COMMERCIALS DETECTION USING HMMS , 2003 .

[27]  B. Yegnanarayana,et al.  Combining multiple evidence for video classification , 2005, Proceedings of 2005 International Conference on Intelligent Sensing and Information Processing, 2005..

[28]  Hanna Göransson,et al.  Improved variance estimation of classification performance via reduction of bias caused by small sample size , 2006, BMC Bioinformatics.

[29]  Regunathan Radhakrishnan,et al.  Audio-visual event detection based on mining of semantic audio-visual labels , 2003, IS&T/SPIE Electronic Imaging.

[30]  Charles A. Bouman,et al.  Stochastic Models of Video Structure for Program Genre Detection , 2003, VLBV.

[31]  S. Venkatesh,et al.  Video genre categorization using audio wavelet coefficients , 2002 .

[32]  Michael J. Swain,et al.  Color indexing , 1991, International Journal of Computer Vision.

[33]  I. Jolliffe Principal Component Analysis , 2002 .

[34]  Mark Pawlewski,et al.  Video genre classification using dynamics , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[35]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[36]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[37]  D. Rumelhart Parallel Distributed Processing Volume 1: Foundations , 1987 .

[38]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[39]  Thomas Sikora,et al.  Cartoon-recognition using video & audio descriptors , 2005, 2005 13th European Signal Processing Conference.

[40]  Christian Igel,et al.  Improving the Rprop Learning Algorithm , 2000 .

[41]  Ba Tu Truong,et al.  Automatic genre identification for content-based video categorization , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[42]  M. Pawlewski,et al.  Motion-based classification of cartoons , 2001, Proceedings of 2001 International Symposium on Intelligent Multimedia, Video and Speech Processing. ISIMP 2001 (IEEE Cat. No.01EX489).

[43]  Alberto Messina,et al.  Characterizing Multimedia Objects through Multimodal Content Analysis and Fuzzy Fingerprints , 2009, SITIS.

[44]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[45]  Fabio Brugnara,et al.  A system for the segmentation and transcription of Italian Audio News , 2000, RIAO.

[46]  Tsuhan Chen,et al.  Audio feature extraction and analysis for scene classification , 1997, Proceedings of First Signal Processing Society Workshop on Multimedia Signal Processing.

[47]  Chng Eng Siong,et al.  Automatic Sports Video Genre Classification using Pseudo-2D-HMM , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[48]  Peter Vojtáš,et al.  Integrating user and group preferences for top-k search from distributed web resources , 2007 .

[49]  L. Agnihotri,et al.  Real time commercial detection using MPEG features , 2001 .

[50]  E. M. Wright,et al.  Adaptive Control Processes: A Guided Tour , 1961, The Mathematical Gazette.

[51]  Alberto Messina,et al.  Multimedia genre characterisation with fuzzy embedding classifiers , 2008, AMDIT '08.

[52]  Gu Jianhua,et al.  Fuzzy clustering for TV program classification , 2004, International Conference on Information Technology: Coding and Computing, 2004. Proceedings. ITCC 2004..

[53]  Matthew Roach Video genre classification , 2003 .

[54]  Jean Vroomen,et al.  Duration and intonation in emotional speech , 1993, EUROSPEECH.

[55]  M. Montagnuolo,et al.  Automatic Genre Classification of TV Programmes Using Gaussian Mixture Models and Neural Networks , 2007 .

[56]  Xingbo Wang,et al.  A Rough Set Approach to Video Genre Classification , 2006, ACIVS.

[57]  Arjen P. de Vries,et al.  Detecting cartoons: a case study in automatic video-genre classification , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[58]  Jean Carrive,et al.  Improving Program Guides for Reducing TV Stream Structuring Problem to a Simple Alignment Problem , 2006, 2006 International Conference on Computational Inteligence for Modelling Control and Automation and International Conference on Intelligent Agents Web Technologies and International Commerce (CIMCA'06).

[59]  David A. Landgrebe,et al.  A survey of decision tree classifier methodology , 1991, IEEE Trans. Syst. Man Cybern..

[60]  Alberto Messina,et al.  Multimedia Knowledge Representation For Automatic Annotation of Broadcast TV Archives , 2007, J. Digit. Inf. Manag..

[61]  A. F. Adams,et al.  The Survey , 2021, Dyslexia in Higher Education.