Distributed multichannel speech enhancement with minimum mean-square error short-time spectral amplitude, log-spectral amplitude, and spectral phase estimation

In this paper, the authors present optimal multichannel frequency domain estimators for minimum mean-square error (MMSE) short-time spectral amplitude (STSA), log-spectral amplitude (LSA), and spectral phase estimation in a widely distributed microphone configuration. The estimators utilize Rayleigh and Gaussian statistical models for the speech prior and noise likelihood with a diffuse noise field for the surrounding environment. Based on the Signal-to-Noise Ratio (SNR) and Segmental Signal-to-Noise Ratio (SSNR) along with the Log-Likelihood Ratio (LLR) and Perceptual Evaluation of Speech Quality (PESQ) as objective metrics, the multichannel LSA estimator decreases background noise and speech distortion and increases speech quality compared to the baseline single channel STSA and LSA estimators, where the optimal multichannel spectral phase estimator serves as a significant quantity to the improvements, and demonstrates robustness due to time alignment and attenuation factor estimation. Overall, the optimal distributed microphone spectral estimators show strong results in noisy environments with application to many consumer, industrial, and military products.

[1]  Carla Teixeira Lopes,et al.  TIMIT Acoustic-Phonetic Continuous Speech Corpus , 2012 .

[2]  Panos E. Papamichalis,et al.  Practical approaches to speech coding , 1987 .

[3]  A. Gualtierotti H. L. Van Trees, Detection, Estimation, and Modulation Theory, , 1976 .

[4]  D. Middleton An Introduction to Statistical Communication Theory , 1960 .

[5]  Andries P. Hekstra,et al.  Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[6]  Jonathan G. Fiscus,et al.  Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[7]  Peter Vary,et al.  Multichannel Direction-Independent Speech Enhancement Using Spectral Amplitude Estimation , 2003, EURASIP J. Adv. Signal Process..

[8]  Arun Ross,et al.  Microphone Arrays , 2009, Encyclopedia of Biometrics.

[9]  H. Vincent Poor,et al.  An Introduction to Signal Detection and Estimation , 1994, Springer Texts in Electrical Engineering.

[10]  B. Widrow,et al.  Adaptive noise cancelling: Principles and applications , 1975 .

[11]  Joseph Lipka,et al.  A Table of Integrals , 2010 .

[12]  Olivier Cappé,et al.  Elimination of the musical noise phenomenon with the Ephraim and Malah noise suppressor , 1994, IEEE Trans. Speech Audio Process..

[13]  Herman J. M. Steeneken,et al.  Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems , 1993, Speech Commun..

[14]  VargaAndrew,et al.  Assessment for automatic speech recognition II , 1993 .

[15]  Ephraim Speech enhancement using a minimum mean square error short-time spectral amplitude estimator , 1984 .

[16]  Lawrence E. Kinsler,et al.  Fundamentals of acoustics , 1950 .

[17]  David Malah,et al.  Speech enhancement using a minimum mean-square error log-spectral amplitude estimator , 1984, IEEE Trans. Acoust. Speech Signal Process..

[18]  David E. Culler,et al.  Analysis of wireless sensor networks for habitat monitoring , 2004 .

[19]  I. M. Pyshik,et al.  Table of integrals, series, and products , 1965 .

[20]  Nicholas Zulu,et al.  Robust speech recognition using microphone arrays and speaker adaptation , .

[21]  B.D. Van Veen,et al.  Beamforming: a versatile approach to spatial filtering , 1988, IEEE ASSP Magazine.

[22]  I. S. Gradshteyn,et al.  Table of Integrals, Series, and Products , 1976 .

[23]  Michael T. Johnson,et al.  Optimal distributed microphone phase estimation , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[24]  H. V. Trees Detection, Estimation, And Modulation Theory , 2001 .

[25]  Yi Hu,et al.  Evaluation of Objective Quality Measures for Speech Enhancement , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[26]  G. Carter,et al.  The generalized correlation method for estimation of time delay , 1976 .

[27]  Schuyler Quackenbush,et al.  Objective measures of speech quality , 1995 .

[28]  John Bowman Thomas,et al.  An introduction to statistical communication theory , 1969 .