Microphone array post-filtering using supervised machine learning for speech enhancement

High level of noise reduces the perceptual quality and intelligibility of speech. Therefore, enhancing the captured speech signal is important in everyday applications such as telephony and teleconferencing. Microphone arrays are typically placed at a distance from a speaker and require processing to enhance the captured signal. Beamforming provides directional gai nt owards the source of interest and attenuation of interference. It is often followed by a single channel post-filter to further enhance the signal. Non-linear spatial post-filters are capable of providing high noise suppression but can produce unwanted musical noise that lowers the perceptual quality of the output. This work proposes an artificial neural network (ANN) to learn the structure of naturally occurring post-filters to enhance speech from interfering noise. The ANN uses phase-based features obtained from a multichannel array as an input. Simulations are used to train the ANN in a supervised manner. The performance is measured with objective scores from speech recorded in an office environment. The post-filters predicted by the ANN are found to improve the perceptual quality over delay-and-sum beamforming while maintaining high suppression of noise characteristic to spatial post-filters. Index Terms:Speech enhancement, Microphone arrays, Array signal processing, Artificial neural networks, Psychoacoustics.

[1]  Scott Rickard,et al.  Blind separation of speech mixtures via time-frequency masking , 2004, IEEE Transactions on Signal Processing.

[2]  DeLiang Wang,et al.  Towards Scaling Up Classification-Based Speech Separation , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Jesper Jensen,et al.  An Algorithm for Intelligibility Prediction of Time–Frequency Weighted Noisy Speech , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  DeLiang Wang,et al.  Speech segregation based on sound localization , 2001, IJCNN'01. International Joint Conference on Neural Networks. Proceedings (Cat. No.01CH37222).

[5]  Eric John Diethorn Subband noise reduction methods for speech enhancement , 2000 .

[6]  DeLiang Wang,et al.  Binaural Detection, Localization, and Segregation in Reverberant Environments Based on Joint Pitch and Azimuth Cues , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  Ivan Tashev,et al.  A LOG-MMSE ADAPTIVE BEAMFORMER USING A NONLINEAR SPATIAL FILTER , 2008 .

[8]  Hervé Bourlard,et al.  Microphone array post-filter based on noise field coherence , 2003, IEEE Trans. Speech Audio Process..

[9]  Björn W. Schuller,et al.  Single-channel speech separation with memory-enhanced recurrent neural networks , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[10]  Ivan Tashev,et al.  Sound Capture and Processing: Practical Approaches , 2009 .

[11]  Nilesh Madhu,et al.  The Potential for Speech Intelligibility Improvement Using the Ideal Binary Mask and the Ideal Wiener Filter in Single Channel Noise Reduction Systems: Application to Auditory Prostheses , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[12]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[13]  DeLiang Wang,et al.  Time-Frequency Masking for Speech Separation and Its Potential for Hearing Aid Design , 2008 .

[14]  Yi Hu,et al.  Evaluation of Objective Quality Measures for Speech Enhancement , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[15]  Petros Maragos,et al.  A generalized estimation approach for linear and nonlinear microphone array post-filters , 2007, Speech Commun..

[16]  Arun Ross,et al.  Microphone Arrays , 2009, Encyclopedia of Biometrics.

[17]  Quoc V. Le,et al.  Recurrent Neural Networks for Noise Reduction in Robust ASR , 2012, INTERSPEECH.

[18]  R. Zelinski,et al.  A microphone array with adaptive post-filtering for noise reduction in reverberant rooms , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[19]  Anja Walter,et al.  Audio Signal Processing For Next Generation Multimedia Communication Systems , 2016 .

[20]  Mark F. Davis Audio and Electroacoustics , 2014 .

[21]  Alex Acero,et al.  Microphone Array Post-Filter using Incremental Bayes Learning to Track the Spatial Distributions of Speech and Noise , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[22]  Bhiksha Raj,et al.  Speech denoising using nonnegative matrix factorization with priors , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[23]  DeLiang Wang,et al.  Ideal ratio mask estimation using deep neural networks for robust speech recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[24]  Ivan Tashev,et al.  MICROPHONE ARRAY POST-PROCESSOR USING INSTANTANEOUS DIRECTION OF ARRIVAL , 2006 .

[25]  Jont B. Allen,et al.  Image method for efficiently simulating small‐room acoustics , 1976 .

[26]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[27]  Michael S. Brandstein,et al.  Robust Localization in Reverberant Rooms , 2001, Microphone Arrays.