Multi-microphone speech dereverberation using spatio-temporal averaging

The use of the source-filter speech production model in methods for enhancement of reverberant speech has received considerable attention over the last few years. Furthermore, it has most recently been shown that spatial averaging of the linear prediction (LP) coefficients is required to improve accuracy in implementation of these types of algorithms. In this paper, we suggest and demonstrate experimentally that LP coefficients obtained from spatially averaged multi-channel speech signals achieve an equally satisfactory result. Consequently, we propose a novel multi-channel speech dereverberation approach operating on the LP residual, utilizing a combination of spatial averaging and a new approach based on inter-cycle temporal averaging. Simulation results and informal listening tests indicate an improvement in terms of direct-to-reverberant sound ratio and in perceived quality of the enhanced speech.

[1]  Jont B. Allen,et al.  Image method for efficiently simulating small‐room acoustics , 1976 .

[2]  J. Makhoul,et al.  Linear prediction: A tutorial review , 1975, Proceedings of the IEEE.

[3]  D. Ward,et al.  ON THE USE OF LINEAR PREDICTION FOR DEREVERBERATION OF SPEECH , 2003 .

[4]  S. R. Mahadeva Prasanna,et al.  Speech enhancement using excitation source information , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  Mike Brookes,et al.  The DYPSA algorithm for estimation of glottal closure instants in voiced speech , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  Michael S. Brandstein,et al.  Microphone array speech dereverberation using coarse channel modeling , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[7]  Andreas Spanias,et al.  Speech coding: a tutorial review , 1994, Proc. IEEE.

[8]  Henrique S. Malvar,et al.  Speech dereverberation via maximum-kurtosis subband adaptive filtering , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[9]  Eric Moulines,et al.  Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones , 1989, Speech Commun..