A semantic framework for video genre classification and event analysis

Semantic video analysis is a key issue in digital video applications, including video retrieval, annotation, and management. Most existing work on semantic video analysis is mainly focused on event detection for specific video genres, while the genre classification is treated as another independent issue. In this paper, we present a semantic framework for weakly supervised video genre classification and event analysis jointly by using probabilistic models for MPEG video streams. Several computable semantic features that can accurately reflect the event attributes are derived. Based on an intensive analysis on the connection between video genres and the contextual relationship among events, as well as the statistical characteristics of dominant event, a hidden Markov model (HMM) and naive Bayesian classifier (NBC) based analysis algorithm is proposed for video genre classification. Another Gaussian mixture model (GMM) is built to detect the contained events using the same semantic features, whilst an event adjustment strategy is proposed according to an analysis on the GMM structure and pre-definition of video events. Subsequently, a special event is recognized based on the detected events by another HMM. The simulative experiments on video genre classification and event analysis using a large number of video data sets demonstrate the promising performance of the proposed framework for semantic video analysis.

[1]  Noel E. O'Connor,et al.  Event detection in field sports video using audio-visual features and a support vector Machine , 2005, IEEE Transactions on Circuits and Systems for Video Technology.

[2]  Zhu Liu,et al.  Classification TV programs based on audio information using hidden Markov model , 1998, 1998 IEEE Second Workshop on Multimedia Signal Processing (Cat. No.98EX175).

[3]  Chengcui Zhang,et al.  Semantic Event Extraction Using Neural Network Ensembles , 2007 .

[4]  Lalit R. Bahl,et al.  Maximum mutual information estimation of hidden Markov model parameters for speech recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  Yaser Sheikh,et al.  On the use of computable features for film classification , 2005, IEEE Transactions on Circuits and Systems for Video Technology.

[6]  Qi Tian,et al.  A unified framework for semantic shot classification in sports video , 2002, IEEE Transactions on Multimedia.

[7]  Zhu Liu,et al.  Integration of multimodal features for video scene classification based on HMM , 1999, 1999 IEEE Third Workshop on Multimedia Signal Processing (Cat. No.99TH8451).

[8]  A. Murat Tekalp,et al.  Automatic Soccer Video Analysis and Summarization , 2003, IS&T/SPIE Electronic Imaging.

[9]  Shih-Fu Chang,et al.  A highly efficient system for automatic face region detection in MPEG video , 1997, IEEE Trans. Circuits Syst. Video Technol..

[10]  Wen-Hsing Hsu,et al.  Movie Classification Using Visual Effect Features , 2007, 2007 IEEE Workshop on Signal Processing Systems.

[11]  Zhu Liu,et al.  Multimedia content analysis-using both audio and visual clues , 2000, IEEE Signal Process. Mag..

[12]  Jr. G. Forney,et al.  The viterbi algorithm , 1973 .

[13]  Feng Niu,et al.  HMM-Based Segmentation and Recognition of Human Activities from Video Sequences , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[14]  Zhu Liu,et al.  Joint scene classification and segmentation based on hidden Markov model , 2005, IEEE Transactions on Multimedia.

[15]  Ying Li,et al.  Content-based movie analysis and indexing based on audiovisual cues , 2004, IEEE Transactions on Circuits and Systems for Video Technology.

[16]  Brendan J. Frey,et al.  Graphical Models for Machine Learning and Digital Communication , 1998 .

[17]  Harry Zhang,et al.  The Optimality of Naive Bayes , 2004, FLAIRS.

[18]  Shiqiang Yang,et al.  Motion based event recognition using HMM , 2002, Object recognition supported by user interaction for service robots.

[19]  Sanjeev R. Kulkarni,et al.  Rapid estimation of camera motion from compressed video with application to video annotation , 2000, IEEE Trans. Circuits Syst. Video Technol..

[20]  Guizhong Liu,et al.  A Multiple Visual Models Based Perceptive Analysis Framework for Multilevel Video Summarization , 2007, IEEE Transactions on Circuits and Systems for Video Technology.

[21]  L. Baum,et al.  An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process , 1972 .

[22]  Shih-Fu Chang,et al.  Structural and semantic analysis of video , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[23]  Milind R. Naphade,et al.  Extracting semantics from audio-visual content: the final frontier in multimedia retrieval , 2002, IEEE Trans. Neural Networks.

[24]  Hisham Othman,et al.  A Separable Low Complexity 2D HMM with Application to Face Recognition , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[25]  Jean-Marc Odobez,et al.  Sports Event Recognition Using Layered HMMS , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[26]  Gu Xu,et al.  An HMM-based framework for video semantic analysis , 2005, IEEE Transactions on Circuits and Systems for Video Technology.

[27]  Boon-Lock Yeo,et al.  Rapid scene analysis on compressed video , 1995, IEEE Trans. Circuits Syst. Video Technol..

[28]  Alan Hanjalic,et al.  Affective video content representation and modeling , 2005, IEEE Transactions on Multimedia.

[29]  Harry Shum,et al.  Automatic extraction of semantic colors in sports video , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[30]  Lide Wu,et al.  An integrated correlation measure for semantic video segmentation , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.

[31]  Hayit Greenspan,et al.  Probabilistic space-time video modeling via piecewise GMM , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Yap-Peng Tan,et al.  Event detection using multimodal feature analysis , 2005, 2005 IEEE International Symposium on Circuits and Systems.

[33]  Frédéric Dufaux,et al.  Efficient, robust, and fast global motion estimation for video coding , 2000, IEEE Trans. Image Process..

[34]  Wolfgang Effelsberg,et al.  Automatic recognition of film genres , 1995, MULTIMEDIA '95.

[35]  Qi Tian,et al.  A unified framework for semantic shot classification in sports video , 2005, IEEE Trans. Multim..

[36]  Michael G. Strintzis,et al.  A knowledge-based approach to domain-specific compressed video analysis , 2004, 2004 International Conference on Image Processing, 2004. ICIP '04..

[37]  Min Chen,et al.  Semantic event detection via multimodal data mining , 2006, IEEE Signal Processing Magazine.

[38]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[39]  L. Baum,et al.  An inequality with applications to statistical estimation for probabilistic functions of Markov processes and to a model for ecology , 1967 .

[40]  Mubarak Shah,et al.  Movie genre classification by exploiting audio-visual features of previews , 2002, Object recognition supported by user interaction for service robots.

[42]  Guizhong Liu,et al.  A novel attention model and its application in video analysis , 2007, Appl. Math. Comput..