Robust MVDR Beamformer Based on Complex Gaussian Mixture Model With Phase Prior

This paper studies a robust beamforming algorithm for microphone array speech enhancement in the presence of speaker interference and background noise. Accurate steering vector estimation is essential for the performance of the beamformer as well as for the successful speech enhancement. Recently, a time-frequency masking technique based on complex Gaussian mixture model (CGMM) was proposed to efficiently estimate the steering vector for beamforming. However, its performance will degrade with observations that contain noise or/and interference only samples due to the inaccuracy of the CGMM parameter estimation. In this paper, a phase prior for a spatial correlation matrix (a CGMM parameter) is proposed to improve the steering vector estimation in the presence of speaker interference and background noise. Computer simulations are conducted to verify the advantages achieved by the proposed phase prior-based beamformer, in comparison with the conventional beamformer and the CGMM-based approach without prior.

[1]  Andries P. Hekstra,et al.  Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[2]  G. Carter,et al.  The generalized correlation method for estimation of time delay , 1976 .

[3]  Lin Wang,et al.  Noise Power Spectral Density Estimation Using MaxNSR Blocking Matrix , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[4]  Radu Horaud,et al.  Estimation of relative transfer function in the presence of stationary noise based on segmental power spectral density matrix subtraction , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5]  Michael S. Brandstein,et al.  Robust Localization in Reverberant Rooms , 2001, Microphone Arrays.

[6]  Takuya Yoshioka,et al.  Robust MVDR beamforming using time-frequency masks for online/offline ASR in noise , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7]  Petros Maragos,et al.  A Phase-Based Time-Frequency Masking for Multi-Channel Speech Enhancement in Domestic Environments , 2016, INTERSPEECH.

[8]  Philipos C. Loizou,et al.  Speech Enhancement: Theory and Practice , 2007 .

[9]  Israel Cohen,et al.  Relative transfer function identification using speech signals , 2004, IEEE Transactions on Speech and Audio Processing.

[10]  Ronald E. Crochiere,et al.  A study of complexity and quality of speech waveform coders , 1978, ICASSP.

[11]  Akihiko Sugiyama,et al.  A new DOA estimation method using a circular microphone array , 2007, 2007 15th European Signal Processing Conference.

[12]  Chengzhu Yu,et al.  The NTT CHiME-3 system: Advances in speech enhancement and recognition for mobile multi-microphone devices , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).

[13]  Israel Cohen,et al.  Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging , 2003, IEEE Trans. Speech Audio Process..

[14]  Ehud Weinstein,et al.  Signal enhancement using beamforming and nonstationarity with applications to speech , 2001, IEEE Trans. Signal Process..

[15]  Emmanuel Vincent,et al.  Subjective and Objective Quality Assessment of Audio Source Separation , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[16]  Jon Barker,et al.  The third ‘CHiME’ speech separation and recognition challenge: Dataset, task and baselines , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).

[17]  L. J. Griffiths,et al.  An alternative approach to linearly constrained adaptive beamforming , 1982 .