On Estimation of Time-Varying Variances of Source and Noise for Sensor Array Processing

Estimation of time-varying variances of signals for beamforming in sensor arrays is a challenging problem. Based on the assumption that the array manifold vector and the noise pseudo-coherence matrix are known a priori or are well estimated, we present in this paper two estimators for estimating the time-varying variances of the source signal of interest and the noise. These two estimators are then extended to deal with the following situations: 1) there are multiple candidates of the noise pseudo-coherence matrix or the noise pseudo-coherence matrix is a linear combination of some base pseudo-coherence matrices, and 2) the estimation variance is large and smoothing is needed. Simulations for speech enhancement applications are performed and the results show that the proposed estimators can well track the time-varying variances of both the speech and noise signals. It is also demonstrated that the optimal beamformer using the variance parameters estimated with the presented estimators outperforms the widely used traditional optimal beamformers in terms of improvement in both the signal-to-noise ratio (SNR) and the log-spectral distortion (LSD).

[1]  Harry L. Van Trees,et al.  Optimum Array Processing: Part IV of Detection, Estimation, and Modulation Theory , 2002 .

[2]  Tomohiro Nakatani,et al.  Noisy cGMM: Complex Gaussian Mixture Model with Non-Sparse Noise Model for Joint Source Separation and Denoising , 2018, 2018 26th European Signal Processing Conference (EUSIPCO).

[3]  Jacob Benesty,et al.  Fundamentals of Noise Reduction , 2008 .

[4]  P. Stoica,et al.  Robust Adaptive Beamforming , 2013 .

[5]  Jacob Benesty,et al.  Springer handbook of speech processing , 2007, Springer Handbooks.

[6]  Tomohiro Nakatani,et al.  Mask-based MVDR Beamformer for Noisy Multisource Environments: Introduction of Time-varying Spatial Covariance Model , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7]  Hiroshi Sawada,et al.  Underdetermined Convolutive Blind Source Separation via Frequency Bin-Wise Clustering and Permutation Alignment , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Rainer Martin,et al.  Improved A Posteriori Speech Presence Probability Estimation Based on a Likelihood Ratio With Fixed Priors , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  E. Lehmann,et al.  Prediction of energy decay in room impulse responses simulated with an image-source model. , 2008, The Journal of the Acoustical Society of America.

[10]  Rémi Gribonval,et al.  Under-Determined Reverberant Audio Source Separation Using a Full-Rank Spatial Covariance Model , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Sharon Gannot,et al.  Adaptive Beamforming and Postfiltering , 2008 .

[12]  Henry Cox,et al.  Robust adaptive beamforming , 2005, IEEE Trans. Acoust. Speech Signal Process..

[13]  Prabhu Babu,et al.  Robust Estimation of Structured Covariance Matrix for Heavy-Tailed Elliptical Distributions , 2015, IEEE Transactions on Signal Processing.

[14]  Jacob Benesty,et al.  New insights into the noise reduction Wiener filter , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[15]  Yujie Gu,et al.  Robust Adaptive Beamforming Based on Interference Covariance Matrix Reconstruction and Steering Vector Estimation , 2012, IEEE Transactions on Signal Processing.

[16]  Lei Huang,et al.  Robust Adaptive Beamforming With a Novel Interference-Plus-Noise Covariance Matrix Reconstruction Method , 2015, IEEE Transactions on Signal Processing.

[17]  Jun Du,et al.  A speech enhancement approach using piecewise linear approximation of an explicit model of environmental distortions , 2008, INTERSPEECH.

[18]  Rainer Martin,et al.  Bias compensation methods for minimum statistics noise power spectral density estimation , 2006, Signal Process..

[19]  Pavel Pudil,et al.  Introduction to Statistical Pattern Recognition , 2006 .

[20]  Israel Cohen,et al.  Speech enhancement for non-stationary noise environments , 2001, Signal Process..

[21]  Kaare Brandt Petersen,et al.  The Matrix Cookbook , 2006 .

[22]  Zhi-Quan Luo,et al.  Robust adaptive beamforming using worst-case performance optimization: a solution to the signal mismatch problem , 2003, IEEE Trans. Signal Process..

[23]  Jacob Benesty,et al.  Gaussian Model-Based Multichannel Speech Presence Probability , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[24]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[25]  Israel Cohen,et al.  Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging , 2003, IEEE Trans. Speech Audio Process..

[26]  DeLiang Wang,et al.  Supervised Speech Separation Based on Deep Learning: An Overview , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[27]  Emanuel A. P. Habets,et al.  Speech Enhancement in the STFT Domain , 2011, Springer Briefs in Electrical and Computer Engineering.

[28]  Walter Kellermann,et al.  Coherent-to-Diffuse Power Ratio Estimation for Dereverberation , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[29]  Jont B. Allen,et al.  Image method for efficiently simulating small‐room acoustics , 1976 .

[30]  M. Vavrda Digital beamforming in wireless communications , 2004 .

[31]  E. Habets,et al.  Generating sensor signals in isotropic noise fields. , 2007, The Journal of the Acoustical Society of America.

[32]  Jacob Benesty,et al.  Performance Study of the MVDR Beamformer as a Function of the Source Incidence Angle , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[33]  Jacob Benesty,et al.  Spectral Enhancement Methods , 2009 .

[34]  Jian Li,et al.  On robust Capon beamforming and diagonal loading , 2003, IEEE Trans. Signal Process..

[35]  Tomohiro Nakatani,et al.  Online MVDR Beamformer Based on Complex Gaussian Mixture Model With Spatial Prior for Noise Robust ASR , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[36]  Boaz Rafaely,et al.  Microphone Array Signal Processing , 2008 .

[37]  Dong Yu,et al.  Automatic Speech Recognition: A Deep Learning Approach , 2014 .