Classifying motion picture soundtrack for video indexing

We investigate a method for classification of patterns with temporal support. This method combines the ability of a nonlinear-kernel based classifier (in the form of a support vector machine) to discriminate and the ability of a first order Markov chain to model temporal transitions. We apply this to the task of classifying motion picture soundtrack. Experiments with classification of the soundtrack into speech and non-speech audio patterns reveal improvement in classification performance using this proposed method over HMM-based classification as well as SVM-based classification. Using a normalized margin obtained from the SVM and mapping it to a non-negative confidence measure bounded by 1, we attempt to alter the classification of patterns close to the separating boundary, by using the constraints on the transition between the two classes. Sound track classification with semantic classes can help browse and index a video efficiently.

[1]  A. Murat Tekalp,et al.  A high-performance shot boundary detection algorithm using multiple cues , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[2]  Vladimir Vapnik,et al.  The Nature of Statistical Learning , 1995 .

[3]  Malcolm Slaney,et al.  Construction and evaluation of a robust multifeature speech/music discriminator , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Hiroshi Hamada,et al.  Video Handling with Music and Speech Detection , 1998, IEEE Multim..

[5]  Daniel Patrick Whittlesey Ellis,et al.  Prediction-driven computational auditory scene analysis , 1996 .

[6]  Brendan J. Frey,et al.  Probabilistic multimedia objects (multijects): a novel approach to video indexing and retrieval in multimedia systems , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[7]  ChenTsuhan,et al.  Audio Feature Extraction and Analysis for Scene Segmentation and Classification , 1998 .

[8]  Milind R. Naphade,et al.  Stochastic modeling of soundtrack for efficient segmentation and indexing of video , 1999, Electronic Imaging.

[9]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[10]  C.-C. Jay Kuo,et al.  Integrated approach to multimodal media content analysis , 1999, Electronic Imaging.

[11]  Alexander G. Hauptmann,et al.  Learning to Recognize Speech by Watching Television , 1999, IEEE Intell. Syst..

[12]  Tsuhan Chen,et al.  Audio Feature Extraction and Analysis for Scene Segmentation and Classification , 1998, J. VLSI Signal Process..

[13]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[14]  Milind R. Naphade,et al.  A probabilistic framework for semantic indexing and retrieval in video , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[15]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[16]  Erling H. Wold,et al.  Content-Based Search, and Retrieval of Audio , 1996 .

[17]  X. Jin Factor graphs and the Sum-Product Algorithm , 2002 .