Speech Enhancement Using Principal Component Analysis and Variance of the Reconstruction Error Model Identification A thesis Presented for the Master's Degree of Telecommunications

In recent years, Automatic Speech Recognition (ASR) systems designed to work in controlled environments using clean speech have reached very high levels of performance. However, the accuracy of speech recognition degrades severely when the systems are operated in noisy environments. In this thesis we address the problem of single-channel speech enhancement. Starting with a study of the state-of-the-art enhancement methods, a comprehensive study of different categories of speech enhancement is presented. As an important class of speech enhancement methods, subspace-based speech enhancement is presented in chapter 2. After a careful study of all forces and drawbacks of this technique, a generalized form of Principal Component Analysis-based (PCA-based) speech enhancement is provided next. As a vital issue in PCA-based enhancement methods, identification of the clean speech signal's model is investigated in chapter 3. Some recent techniques to define the rank of a clean speech signal are presented in this chapter. In the rest of the thesis, a novel technique for rank estimation is developed. We introduce therefore a novel approach for the optimal subspace partitioning using the Variance of the Reconstruction Error (VRE) criterion. This criterion provides consistent parameter estimates and allows us to implement an automatic noise reduction algorithm that can be simply applied to the observed data. This choice also overcomes many limitations encountered with other selection criteria, like overestimation of the signal subspace or the need for empirical parameters. We have also extended our subspace algorithm to take into account the case of colored and babble noise. Informal listening tests and illustrations have confirmed the method to be numerically noise robust regardless of the type of the noise. ii Acknowledgements

[1]  Petr Pollák,et al.  Methods for Speech SNR Estimation: Evaluation Tool and Analysis of VAD Dependency , 2005 .

[2]  Richard M. Schwartz,et al.  Enhancement of speech corrupted by acoustic noise , 1979, ICASSP.

[3]  Edmund R. Malinowski,et al.  Determination of the number of factors and the experimental error in a data matrix , 1977 .

[4]  Yi Hu,et al.  A perceptually motivated approach for speech enhancement , 2003, IEEE Trans. Speech Audio Process..

[5]  Richard I. Shrager,et al.  Titration of individual components in a mixture with resolution of difference spectra, pKs, and redox transitions , 1982 .

[6]  S. Joe Qin,et al.  Determining the number of principal components for best reconstruction , 1998 .

[7]  H. Kaiser The Application of Electronic Computers to Factor Analysis , 1960 .

[8]  Yi Hu,et al.  A perceptually motivated subspace approach for speech enhancement , 2002, INTERSPEECH.

[9]  Philipos C. Loizou,et al.  A multi-band spectral subtraction method for enhancing speech corrupted by colored noise , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10]  J. Horn A rationale and test for the number of factors in factor analysis , 1965, Psychometrika.

[11]  H. Piaggio Mathematical Analysis , 1955, Nature.

[12]  James A. Cadzow,et al.  Signal enhancement-a composite property mapping algorithm , 1988, IEEE Trans. Acoust. Speech Signal Process..

[13]  Yariv Ephraim,et al.  A Bayesian estimation approach for speech enhancement using hidden Markov models , 1992, IEEE Trans. Signal Process..

[14]  David Pearce,et al.  The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions , 2000, INTERSPEECH.

[15]  J. Makhoul On the eigenvectors of symmetric Toeplitz matrices , 1981 .

[16]  T. Apostol Mathematical Analysis , 1957 .

[17]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[18]  Thomas Kailath,et al.  Detection of signals by information theoretic criteria , 1985, IEEE Trans. Acoust. Speech Signal Process..

[19]  Joseph Sylvester Chang,et al.  A parametric formulation of the generalized spectral subtraction method , 1998, IEEE Trans. Speech Audio Process..

[20]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[21]  Rainer Martin,et al.  Speech enhancement using MMSE short time spectral estimation with gamma distributed speech priors , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[22]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[23]  Benoît Champagne,et al.  Incorporating the human hearing properties in the signal subspace approach for speech enhancement , 2003, IEEE Trans. Speech Audio Process..

[24]  Y. Ephraim,et al.  Extension of the signal subspace speech enhancement approach to colored noise , 2003, IEEE Signal Processing Letters.

[25]  W. Velicer,et al.  Comparison of five rules for determining the number of components to retain. , 1986 .

[26]  Hugo Van hamme,et al.  A Review of Signal Subspace Speech Enhancement and Its Application to Noise Robust Speech Recognition , 2007, EURASIP J. Adv. Signal Process..

[27]  S. Qin,et al.  Selection of the Number of Principal Components: The Variance of the Reconstruction Error Criterion with a Comparison to Other Methods† , 1999 .

[28]  Søren Holdt Jensen,et al.  Reduction of broad-band noise in speech by truncated QSVD , 1995, IEEE Trans. Speech Audio Process..

[29]  David Malah,et al.  Speech enhancement using a minimum mean-square error log-spectral amplitude estimator , 1984, IEEE Trans. Acoust. Speech Signal Process..

[30]  Christos Georgakis,et al.  Determination of the number of principal components for disturbance detection and isolation , 1994, Proceedings of 1994 American Control Conference - ACC '94.

[31]  Bart De Moor,et al.  The singular value decomposition and long and short spaces of noisy matrices , 1993, IEEE Trans. Signal Process..

[32]  Saeed Gazor,et al.  An adaptive KLT approach for speech enhancement , 2001, IEEE Trans. Speech Audio Process..

[33]  Israel Cohen,et al.  Speech enhancement using a noncausal a priori SNR estimator , 2004, IEEE Signal Processing Letters.

[34]  R. Kumaresan,et al.  Data adaptive signal estimation by singular value decomposition of a data matrix , 1982, Proceedings of the IEEE.

[35]  Douglas D. O'Shaughnessy,et al.  Speech communications - human and machine, 2nd Edition , 2000 .

[36]  Ephraim Speech enhancement using a minimum mean square error short-time spectral amplitude estimator , 1984 .

[37]  Yariv Ephraim,et al.  A signal subspace approach for speech enhancement , 1995, IEEE Trans. Speech Audio Process..

[38]  George Carayannis,et al.  Speech enhancement from noise: A regenerative approach , 1991, Speech Commun..

[39]  Pascal Scalart,et al.  Speech enhancement based on a priori signal to noise estimation , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[40]  Steven F. Boll,et al.  Optimal estimators for spectral restoration of noisy speech , 1984, ICASSP.