Suppression of Late Reverberation Effect on Speech Signal Using Long-Term Multiple-step Linear Prediction

A speech signal captured by a distant microphone is generally smeared by reverberation, which severely degrades automatic speech recognition (ASR) performance. One way to solve this problem is to dereverberate the observed signal prior to ASR. In this paper, a room impulse response is assumed to consist of three parts: a direct-path response, early reflections and late reverberations. Since late reverberations are known to be a major cause of ASR performance degradation, this paper focuses on dealing with the effect of late reverberations. The proposed method first estimates the late reverberations using long-term multi-step linear prediction, and then reduces the late reverberation effect by employing spectral subtraction. The algorithm provided good dereverberation with training data corresponding to the duration of one speech utterance, in our case, less than 6 s. This paper describes the proposed framework for both single-channel and multichannel scenarios. Experimental results showed substantial improvements in ASR performance with real recordings under severe reverberant conditions.

[1]  Diego P. Ruiz,et al.  Recursive methods for estimating multiple missing values of a multivariate stationary process , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[2]  Ivan Tashev,et al.  REVEREBERATION REDUCTION FOR IMPROVED SPEECH RECOGNITION , 2004 .

[3]  Nobuhiko Kitawaki,et al.  Speech-quality assessment methods for speech-coding systems , 1984, IEEE Communications Magazine.

[4]  Heinrich Kuttruff,et al.  Room acoustics , 1973 .

[5]  D.T.M. Slock,et al.  Multichannel estimation by blind MMSE ZF equalization , 1999, 1999 2nd IEEE Workshop on Signal Processing Advances in Wireless Communications (Cat. No.99EX304).

[6]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[7]  Jj Odell,et al.  The Use of Context in Large Vocabulary Speech Recognition , 1995 .

[8]  Tomohiro Nakatani,et al.  Spectral Subtraction Steered by Multi-Step Forward Linear Prediction For Single Channel Speech Dereverberation , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[9]  Sharon Gannot Subspace methods for multi microphone speech dereverberation , 2001 .

[10]  D. Harville Matrix Algebra From a Statistician's Perspective , 1998 .

[11]  Paul Van Dooren,et al.  Factorizations and linear system solvers for matrices with Toeplitz structure , 2001 .

[12]  Lang Tong,et al.  Joint order detection and blind channel estimation by least squares smoothing , 1999, IEEE Trans. Signal Process..

[13]  Georgios B. Giannakis,et al.  Signal Processing Advances in Wireless and Mobile Communications, Volume 2: Trends in Single- and Multi-User Systems , 2000 .

[14]  D. Ward,et al.  ON THE USE OF LINEAR PREDICTION FOR DEREVERBERATION OF SPEECH , 2003 .

[15]  Mark J. F. Gales,et al.  Robust continuous speech recognition using parallel model combination , 1996, IEEE Trans. Speech Audio Process..

[16]  Chrysostomos L. Nikias,et al.  EVAM: an eigenvector-based algorithm for multichannel blind deconvolution of input colored signals , 1995, IEEE Trans. Signal Process..

[17]  Kiyohiro Shikano,et al.  Recognition of noisy speech by composition of hidden Markov models , 1993, EUROSPEECH.

[18]  Tomohiro Nakatani,et al.  Multi-step linear prediction based speech dereverberation in noisy reverberant environment , 2007, INTERSPEECH.

[19]  Jont B. Allen,et al.  Image method for efficiently simulating small‐room acoustics , 1976 .

[20]  Bayya Yegnanarayana,et al.  Enhancement of reverberant speech using LP residual signal , 2000, IEEE Trans. Speech Audio Process..

[21]  DeLiang Wang,et al.  A one-microphone algorithm for reverberant speech enhancement , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[22]  Rodney A. Kennedy,et al.  Equalization in an acoustic reverberant environment: robustness results , 2000, IEEE Trans. Speech Audio Process..

[23]  Les E. Atlas,et al.  Acoustic diversity for improved speech recognition in reverberant environments , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[24]  Juan Carlos,et al.  Review of "Discrete-Time Speech Signal Processing - Principles and Practice", by Thomas Quatieri, Prentice-Hall, 2001 , 2003 .

[25]  P. Gomez,et al.  Speech Enhancement based on Linear Prediction Error Signals and Spectral Subtraction , 2003 .

[26]  Henrique S. Malvar,et al.  Speech dereverberation via maximum-kurtosis subband adaptive filtering , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[27]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[28]  Brian Kingsbury,et al.  Recognizing reverberant speech with RASTA-PLP , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[29]  J.-M. Boucher,et al.  A New Method Based on Spectral Subtraction for Speech Dereverberation , 2001 .

[30]  Sabine Van Huffel,et al.  SLICOT—A Subroutine Library in Systems and Control Theory , 1999 .

[31]  Marc Moonen,et al.  Subspace Methods for Multimicrophone Speech Dereverberation , 2003, EURASIP J. Adv. Signal Process..

[32]  Masato Miyoshi,et al.  Inverse filtering of room acoustics , 1988, IEEE Trans. Acoust. Speech Signal Process..

[33]  Hynek Hermansky,et al.  Enhancement of reverberant speech using LP residual , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[34]  David Gesbert,et al.  Robust blind channel identification and equalization based on multi-step predictors , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[35]  G. Carter,et al.  The generalized correlation method for estimation of time delay , 1976 .

[36]  Masato Miyoshi,et al.  Blind dereverberation algorithm for speech signals based on multi-channel linear prediction , 2005 .