Real-time integration of statistical model-based speech enhancement with unsupervised noise PSD estimation using microphone array

We propose a technique of multi-channel speech enhancement based on integration of beamforming and statistical model-based speech enhancement to clearly extract the target speech, even in very noisy environments. Conventional microphone array-based techniques estimate speech and noise power spectral densities (PSDs) from the spatial cues of the sound sources; however, their estimation errors dramatically increase when there are many noise sources. We integrated clean speech models trained in advance and the noise PSDs estimated in beamspace to compose observation models and designed a precise Wiener filter. Experiments under adverse noise conditions showed that the proposed technique significantly improved the signal-to-noise ratios (SNRs) compared with the conventional microphone array processing technique.

[1]  Tomohiro Nakatani,et al.  Coupling beamforming with spatial and spectral feature based spectral enhancement and its application to meeting recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[2]  Antonio M. Peinado,et al.  Model-based compensation of the additive noise for continuous speech recognition. experiments using the Aurora II database and tasks , 2001, INTERSPEECH.

[3]  Bhiksha Raj,et al.  Microphone Array Post-filter based on Spatially-Correlated Noise Measurements for Distant Speech Recognition , 2012, INTERSPEECH.

[4]  Yusuke Hioka,et al.  PSD estimation in beamspace for source separation in a diffuse noise field , 2014, 2014 14th International Workshop on Acoustic Signal Enhancement (IWAENC).

[5]  Zhijian Ou,et al.  Closely Coupled Array Processing and Model-Based Compensation for Microphone Array Speech Recognition , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  B. Moore,et al.  Suggested formulae for calculating auditory-filter bandwidths and excitation patterns. , 1983, The Journal of the Acoustical Society of America.

[7]  R. Zelinski,et al.  A microphone array with adaptive post-filtering for noise reduction in reverberant rooms , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[8]  Yusuke Hioka,et al.  Post-filter design for speech enhancement in various noisy environments , 2014, 2014 14th International Workshop on Acoustic Signal Enhancement (IWAENC).

[9]  Michael S. Brandstein,et al.  Microphone Arrays - Signal Processing Techniques and Applications , 2001, Microphone Arrays.

[10]  P. Stoica,et al.  Robust Adaptive Beamforming , 2013 .

[11]  Masakiyo Fujimoto,et al.  A study of mutual front-end processing method based on statistical model for noise robust speech recognition , 2009, INTERSPEECH.

[12]  Hervé Bourlard,et al.  Microphone array post-filter based on noise field coherence , 2003, IEEE Trans. Speech Audio Process..

[13]  Richard M. Stern,et al.  A vector Taylor series approach for environment-independent speech recognition , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[14]  Jacob Benesty,et al.  Speech Enhancement , 2010 .

[15]  Yannick Mahieux,et al.  Analysis of noise reduction and dereverberation techniques based on microphone arrays with postfiltering , 1998, IEEE Trans. Speech Audio Process..

[16]  T. Horiuchi,et al.  Hands-free speech recognition and communication on PDAs using microphone array technology , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..

[17]  Masanori Tsujikawa,et al.  Model-Basedwiener Filter for Noise Robust Speech Recognition , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[18]  Don H. Johnson,et al.  Array Signal Processing: Concepts and Techniques , 1993 .

[19]  Yusuke Hioka,et al.  Underdetermined Sound Source Separation Using Power Spectrum Density Estimated by Combination of Directivity Gain , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[20]  M. Wolfel,et al.  Minimum variance distortionless response spectral estimation , 2005, IEEE Signal Processing Magazine.