Perceptual evaluation of blind source separation for robust speech recognition

In a previous article, an evaluation of several objective quality measures as predictors of recognition rate after the application of a blind source separation algorithm was reported. In this work, the experiments were repeated using some new measures, based on the perceptual evaluation of speech quality (PESQ), which is part of the ITU P862 standard for evaluation of communication systems. The raw PESQ and a nonlinearly transformed PESQ were evaluated, together with several composite measures. The results show that the PESQ-based measures outperformed all the measures reported in the previous work. Based on these results, we recommend the use of PESQ-based measures to evaluate blind source separation algorithms for automatic speech recognition.

[1]  Lucas C. Parra,et al.  A SURVEY OF CONVOLUTIVE BLIND SOURCE SEPARATION METHODS , 2007 .

[2]  Andries P. Hekstra,et al.  Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[3]  Rémi Gribonval,et al.  Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Jianfeng Chen,et al.  Investigations into the relationship between measurable speech quality and speech recognition rate for telephony speech , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  Jacob Benesty,et al.  Springer handbook of speech processing , 2007, Springer Handbooks.

[6]  Paris Smaragdis,et al.  Evaluation of blind signal separation methods , 1999 .

[7]  Pierre Divenyi Speech Separation by Humans and Machines , 2004 .

[8]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[9]  Takeshi Yamada,et al.  Performance Estimation of Speech Recognition System Under Noise Conditions Using Objective Quality Measures and Artificial Voice , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  Yi Hu,et al.  Evaluation of Objective Quality Measures for Speech Enhancement , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Dennis H. Klatt,et al.  Prediction of perceived phonetic distance from critical-band spectra: A first step , 1982, ICASSP.

[12]  Daniel P. W. Ellis,et al.  Evaluating Speech Separation Systems , 2005, Speech Separation by Humans and Machines.

[13]  Bryan Pardo,et al.  Modeling Perceptual Similarity of Audio Signals for Blind Source Separation Evaluation , 2007, ICA.

[14]  James R. Hopgood,et al.  Nonconcurrent multiple speakers tracking based on extended Kalman particle filter , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[15]  Diego H. Milone,et al.  Objective quality evaluation in blind source separation for speech recognition in a real room , 2007, Signal Process..