Two-stage cascaded classification approach based on genetic fuzzy learning for speech/music discrimination

Automatic discrimination of speech and music is an important tool in many multimedia applications. The paper presents a robust and effective approach for speech/music discrimination, which relies on a two-stage cascaded classification scheme. The cascaded classification scheme is composed of a statistical pattern recognition classifier followed by a genetic fuzzy system. For the first stage of the classification scheme, other widely used classifiers, such as neural networks and support vector machines, have also been considered in order to assess the robustness of the proposed classification scheme. Comparison with well-proven signal features is also performed. In this work, the most commonly used genetic learning algorithms (Michigan and Pittsburgh) have been evaluated in the proposed two-stage classification scheme. The genetic fuzzy system gives rise to an improvement of about 4% in the classification accuracy rate. Experimental results show the good performance of the proposed approach with a classification accuracy rate of about 97% for the best trial.

[1]  Peter Kabal,et al.  Speech/music discrimination for multimedia applications , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[2]  Paul Mermelstein,et al.  Experiments in syllable-based recognition of continuous speech , 1980, ICASSP.

[3]  Alexander Lerch,et al.  Hierarchical Automatic Audio Signal Classification , 2004 .

[4]  Nicolás Ruiz-Reyes,et al.  Audio Coding Improvement Using Evolutionary Speech/Music Discrimination , 2007, 2007 IEEE International Fuzzy Systems Conference.

[5]  V. T. Ruoppila,et al.  Combined speech and audio coding by discrimination , 2000, 2000 IEEE Workshop on Speech Coding. Proceedings. Meeting the Challenges of the New Millennium (Cat. No.00EX421).

[6]  John Saunders,et al.  Real-time discrimination of broadcast speech/music , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[7]  Malcolm Slaney,et al.  Construction and evaluation of a robust multifeature speech/music discriminator , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  Gaël Richard,et al.  Combined Supervised and Unsupervised Approaches for Automatic Segmentation of Radiophonic Audio Streams , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[9]  Georgios Tziritas,et al.  A speech/music discriminator based on RMS and zero-crossings , 2005, IEEE Transactions on Multimedia.

[10]  Yoshihiko Hayashi,et al.  Audio source segmentation using spectral correlation features for automatic indexing of broadcast news , 2004, 2004 12th European Signal Processing Conference.

[11]  Stephen F. Smith,et al.  A learning system based on genetic adaptive algorithms , 1980 .

[12]  C.-C. Jay Kuo,et al.  Audio content analysis for online audiovisual data segmentation and classification , 2001, IEEE Trans. Speech Audio Process..

[13]  Lotfi A. Zadeh,et al.  Fuzzy Sets , 1996, Inf. Control..

[14]  Lie Lu,et al.  Content analysis for audio classification and segmentation , 2002, IEEE Trans. Speech Audio Process..

[15]  David G. Stork,et al.  Pattern Classification , 1973 .

[16]  Liming Chen,et al.  Robust speech music discrimination using spectrum's first order statistics and neural networks , 2003, Seventh International Symposium on Signal Processing and Its Applications, 2003. Proceedings..

[17]  Michael J. Carey,et al.  A comparison of features for speech, music discrimination , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[18]  Francisco Herrera,et al.  Genetic Fuzzy Systems - Evolutionary Tuning and Learning of Fuzzy Knowledge Bases , 2002, Advances in Fuzzy Systems - Applications and Theory.

[19]  Jun Wang,et al.  Real-time speech/music classification with a hierarchical oblique decision tree , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[20]  Wen Gao,et al.  A fast and robust speech/music discrimination approach , 2003, Fourth International Conference on Information, Communications and Signal Processing, 2003 and the Fourth Pacific Rim Conference on Multimedia. Proceedings of the 2003 Joint.

[21]  George Tzanetakis,et al.  Musical genre classification of audio signals , 2002, IEEE Trans. Speech Audio Process..

[22]  Hiroshi Hamada,et al.  Video Handling with Music and Speech Detection , 1998, IEEE Multim..

[23]  Hyon-Soo Lee,et al.  Speech/Music Discrimination using Spectral Peak Feature for Speaker Indexing , 2006, 2006 International Symposium on Intelligent Signal Processing and Communications.

[24]  H. Ezzaidi,et al.  Comparison of the Statistical and Information Theory Measures: Application to Automatic Musical Genre Classification , 2007, 2007 IEEE Workshop on Machine Learning for Signal Processing.

[25]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[26]  Nicolás Ruiz-Reyes,et al.  Adaptive network-based fuzzy inference system vs. other classification algorithms for warped LPC-based speech/music discrimination , 2007, Eng. Appl. Artif. Intell..

[27]  Beth Logan,et al.  Mel Frequency Cepstral Coefficients for Music Modeling , 2000, ISMIR.

[28]  Gilles Venturini,et al.  SIA: A Supervised Inductive Algorithm with Genetic Search for Learning Attributes based Concepts , 1993, ECML.

[29]  Zhang Xiong-wei,et al.  The Application of Speech/Music Automatic Discrimination Based on Gray Correlation Analysis , 2006, 2006 5th IEEE International Conference on Cognitive Informatics.

[30]  Lashon B. Booker,et al.  Intelligent Behavior as an Adaptation to the Task Environment , 1982 .