Instantaneous PSD Estimation for Speech Enhancement based on Generalized Principal Components

Power spectral density (PSD) estimates of various microphone signal components are essential to many speech enhancement procedures. As speech is highly non-nonstationary, performance improvements may be gained by maintaining time-variations in PSD estimates. In this paper, we propose an instantaneous PSD estimation approach based on generalized principal components. Similarly to other eigenspace-based PSD estimation approaches, we rely on recursive averaging in order to obtain a microphone signal correlation matrix estimate to be decomposed. However, instead of estimating the PSDs directly from the temporally smooth generalized eigenvalues of this matrix, yielding temporally smooth PSD estimates, we propose to estimate the PSDs from newly defined instantaneous generalized eigenvalues, yielding instantaneous PSD estimates. The instantaneous generalized eigenvalues are defined from the generalized principal components, i.e. a generalized eigenvector-based transform of the microphone signals. We further show that the smooth generalized eigenvalues can be understood as a recursive average of the instantaneous generalized eigenvalues. Simulation results comparing the multi-channel Wiener filter (MWF) with smooth and instantaneous PSD estimates indicate better speech enhancement performance for the latter. A MATLAB implementation is available online.

[1]  I. Cohen,et al.  Generating nonstationary multisensor signals under a spatial coherence constraint. , 2008, The Journal of the Acoustical Society of America.

[2]  Marc Moonen,et al.  Square Root-Based Multi-Source Early PSD Estimation and Recursive RETF Update in Reverberant Environments by Means of the Orthogonal Procrustes Problem , 2019, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[3]  Emanuel A. P. Habets,et al.  Joint estimation of late reverberant and speech power spectral densities in noisy environments using frobenius norm , 2016, 2016 24th European Signal Processing Conference (EUSIPCO).

[4]  Jesper Jensen,et al.  An Algorithm for Intelligibility Prediction of Time–Frequency Weighted Noisy Speech , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  Martin Bouchard,et al.  Instantaneous Binaural Target PSD Estimation for Hearing Aid Noise Reduction in Complex Acoustic Environments , 2011, IEEE Transactions on Instrumentation and Measurement.

[6]  P. Loizou,et al.  Factors influencing intelligibility of ideal binary-masked speech: implications for noise reduction. , 2008, The Journal of the Acoustical Society of America.

[7]  DeLiang Wang,et al.  Speech intelligibility in background noise with ideal binary time-frequency masking. , 2009, The Journal of the Acoustical Society of America.

[8]  I KoutrouvelisAndreas,et al.  Robust Joint Estimation of Multimicrophone Signal Model Parameters , 2019 .

[9]  DeLiang Wang,et al.  Ideal ratio mask estimation using deep neural networks for robust speech recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[10]  Emmanuel Vincent,et al.  A Consolidated Perspective on Multimicrophone Speech Enhancement and Source Separation , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[11]  Israel Cohen,et al.  Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging , 2003, IEEE Trans. Speech Audio Process..

[12]  Sharon Gannot,et al.  Evaluation and Comparison of Late Reverberation Power Spectral Density Estimators , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[13]  Marc Moonen,et al.  Acoustic Beamforming for Hearing Aid Applications , 2010 .

[14]  F. Jacobsen,et al.  The coherence of reverberant sound fields. , 2000, The Journal of the Acoustical Society of America.

[15]  Sharon Gannot,et al.  Performance analysis of the covariance subtraction method for relative transfer function estimation and comparison to the covariance whitening method , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[16]  Peter Vary,et al.  Multichannel audio database in various acoustic environments , 2014, 2014 14th International Workshop on Acoustic Signal Enhancement (IWAENC).

[17]  Jesper Jensen,et al.  Noise Tracking Using DFT Domain Subspace Decompositions , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[18]  Ina Kodrasi,et al.  Analysis of Eigenvalue Decomposition-Based Late Reverberation Power Spectral Density Estimation , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[19]  Søren Holdt Jensen,et al.  Maximum Likelihood PSD Estimation for Speech Enhancement in Reverberation and Noise , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[20]  Philipos C. Loizou,et al.  Speech Enhancement: Theory and Practice , 2007 .

[21]  Marc Moonen,et al.  Integrated Sidelobe Cancellation and Linear Prediction Kalman Filter for Joint Multi-Microphone Speech Dereverberation, Interfering Speech Cancellation, and Noise Reduction , 2019, IEEE/ACM Transactions on Audio, Speech, and Language Processing.