A two level strategy for audio segmentation

In this paper we are dealing with audio segmentation. The audio tracks are sampled in short sequences which are classified into several classes. Every sequence can then be further analyzed depending on the class it belongs to. We first describe simple techniques for segmentation in two or three classes. These methods rely on amplitude, spectral or cepstral analysis, and classical Hidden Markov Models. From the limitations of these approaches, we propose a two level segmentation process. The segmentation is performed by computing several features for each audio sequence. These features are computed either on a complete audio segment or on a frame (set of samples) which is a subset of the audio segment. The proposed approach for microsegmentation of audio data consists of a combination of a K-mean classifier at the segment level and of a Multidimensional Hidden Markov Model system using the frame decomposition of the signal. A first classification is obtained using the K-mean classifier and segment-based features. Then final result comes from the use of Multidimensional Hidden Markov Models and frame-based features involving temporary results. Multidimensional Hidden Markov Models are an extension of classical Hidden Markov Models dedicated to multicomponent data. They are particularly adapted to our case where each audio segment can be characterized by several features of different natures. We illustrate our methods in the context of analysis of football audio tracks.

[1]  Tsuhan Chen,et al.  Audio Feature Extraction and Analysis for Scene Segmentation and Classification , 1998, J. VLSI Signal Process..

[2]  George Tzanetakis,et al.  Sound analysis using MPEG compressed audio , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[3]  Zhu Liu,et al.  Multimedia content analysis-using both audio and visual clues , 2000, IEEE Signal Process. Mag..

[4]  Hans-Günter Hirsch HMM adaptation for applications in telecommunication , 2001, Speech Commun..

[5]  Wolfgang Effelsberg,et al.  Automatic audio content analysis , 1997, MULTIMEDIA '96.

[6]  Stan Z. Li,et al.  Content-based Classification and Retrieval of Audio Using the Nearest Feature Line Method , 2000 .

[7]  Ishwar K. Sethi,et al.  Classification of general audio data for content-based retrieval , 2001, Pattern Recognit. Lett..

[8]  Borivoje Furht,et al.  Handbook on Multimedia Computing , 1998 .

[9]  Pedro Cano,et al.  Automatic Segmentation for Music Classification using Competitive Hidden Markov Models , 2000, ISMIR.

[10]  Martin Kermit,et al.  Audio signal identification via pattern capture and template matching , 2000, Pattern Recognit. Lett..

[11]  C.-C. Jay Kuo,et al.  Heuristic approach for generic audio data segmentation and annotation , 1999, MULTIMEDIA '99.

[12]  Don Kimber,et al.  Acoustic Segmentation for Audio Browsers , 1997 .

[13]  Stan Z. Li,et al.  Content-based audio classification and retrieval using the nearest feature line method , 2000, IEEE Trans. Speech Audio Process..

[14]  Jay G. Wilpon,et al.  Discriminative feature selection for speech recognition , 1993, Comput. Speech Lang..

[15]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[16]  Yangsheng Xu,et al.  Hidden Markov model approach to skill learning and its application to telerobotics , 1993, IEEE Trans. Robotics Autom..

[17]  Thierry Brouard Algorithmes hybrides d'apprentissage de chaines de Markov cachées : conception et applications à la reconnaissance des formes , 1999 .

[18]  J. Wade Davis,et al.  Statistical Pattern Recognition , 2003, Technometrics.

[19]  L. Baum,et al.  An inequality with applications to statistical estimation for probabilistic functions of Markov processes and to a model for ecology , 1967 .