Audio Coding Improvement Using Evolutionary Speech/Music Discrimination

Automatic speech/music discrimination is an important tool used in many multimedia applications, becoming a research topic of interest in the last years. This paper presents our last works in the speech/music discrimination field, aiming to improve the coding efficiency of standard audio coders (i.e. MP3, AAC) when speech and music signals are involved. In order to discriminate between speech and music, a fuzzy rules-based expert system is incorporated into the decision-taking stage of traditional speech/music discrimination systems. The knowledge base of the fuzzy expert system has been obtained by means of a typical genetic learning algorithm (the Pittsburgh algorithm). The proposed speech/music discrimination scheme manages the operation of an intelligent audio coder, which selects a GSM coder for speech frames and an AAC coder for music ones, resulting in a lower bit rate regarding the case of using a standardized audio coder (AAC in this work). Further, the intelligent audio coder has been designed aiming to obtain a similar subjective audio quality than AAC. GSM operates at 13 kbits/s, while in the experiments the bit rate specification for AAC has been 32 kbits/s for one-channel audio signals.

[1]  David G. Stork,et al.  Pattern Classification , 1973 .

[2]  Lashon B. Booker,et al.  Intelligent Behavior as an Adaptation to the Task Environment , 1982 .

[3]  Malcolm Slaney,et al.  Construction and evaluation of a robust multifeature speech/music discriminator , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Wen Gao,et al.  A fast and robust speech/music discrimination approach , 2003, Fourth International Conference on Information, Communications and Signal Processing, 2003 and the Fourth Pacific Rim Conference on Multimedia. Proceedings of the 2003 Joint.

[5]  George Tzanetakis,et al.  Musical genre classification of audio signals , 2002, IEEE Trans. Speech Audio Process..

[6]  John Saunders,et al.  Real-time discrimination of broadcast speech/music , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[7]  Liming Chen,et al.  Robust speech music discrimination using spectrum's first order statistics and neural networks , 2003, Seventh International Symposium on Signal Processing and Its Applications, 2003. Proceedings..

[8]  Nicolás Ruiz-Reyes,et al.  New speech/music discrimination approach based on warping transformation and ANFIS , 2006 .

[9]  Stephen F. Smith,et al.  A learning system based on genetic adaptive algorithms , 1980 .

[10]  Francisco Herrera,et al.  Genetic Fuzzy Systems - Evolutionary Tuning and Learning of Fuzzy Knowledge Bases , 2002, Advances in Fuzzy Systems - Applications and Theory.

[11]  Gilles Venturini,et al.  SIA: A Supervised Inductive Algorithm with Genetic Search for Learning Attributes based Concepts , 1993, ECML.

[12]  Luis Magdalena,et al.  Fuzzy Rule-Based Controllers that Learn by Evolving their Knowledge Base , 1996 .

[13]  J.E.M. Exposito,et al.  Expert system for intelligent audio codification based in speech/music discrimination , 2006, 2006 International Symposium on Evolving Fuzzy Systems.

[14]  H. Ishibuchi Genetic fuzzy systems: evolutionary tuning and learning of fuzzy knowledge bases , 2004 .

[15]  S. A. Ramprashad A multimode transform predictive coder (MTPC) for speech and audio , 1999, 1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351).

[16]  Rong-Yu Iao Mixed wideband speech and music coding using a speech/music discriminator , 1997, TENCON '97 Brisbane - Australia. Proceedings of IEEE TENCON '97. IEEE Region 10 Annual Conference. Speech and Image Technologies for Computing and Telecommunications (Cat. No.97CH36162).

[17]  Peter Kabal,et al.  Speech/music discrimination for multimedia applications , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).