Successive relative transfer function identification using single microphone speech enhancement

A distortionless speech extraction in a reverberant environment can be achieved by an application of a beamforming algorithm, provided that the relative transfer functions (RTFs) of the sources and the covariance matrix of the noise are known. In this contribution, we consider the RTF identification challenge in a multi-source scenario. We propose a successive RTF identification (SRI), based on a sole assumption that sources become successively active. The proposed algorithm identifies the RTF of the ¿th speech source assuming that the RTFs of all other sources in the environment and the power spectral density (PSD) matrix of the noise were previously estimated. The proposed RTF identification algorithm is based on the neural network Mix-Max (NN-MM) single microphone speech enhancement algorithm, followed by a least-squares (LS) system identification method. The proposed RTF estimation algorithm is validated by simulation.

[1]  Marc Moonen,et al.  LCMV beamforming with subspace projection for multi-speaker speech enhancement , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Israel Cohen,et al.  Subspace tracking of multiple sources and its application to speakers extraction , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[3]  A. Nadas,et al.  Speech recognition using noise-adaptive prototypes , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[4]  Sharon Gannot,et al.  A Hybrid Approach for Speech Enhancement Using MoG Model and Neural Network Phoneme Classifier , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[5]  Sharon Gannot,et al.  Performance analysis of the covariance subtraction method for relative transfer function estimation and comparison to the covariance whitening method , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6]  S. Haykin,et al.  Adaptive Filter Theory , 1986 .

[7]  G. Strang Introduction to Linear Algebra , 1993 .

[8]  Israel Cohen,et al.  Relative transfer function identification using speech signals , 2004, IEEE Transactions on Speech and Audio Processing.

[9]  Sharon Gannot,et al.  A phoneme-based pre-training approach for deep neural network with application to speech enhancement , 2016, 2016 IEEE International Workshop on Acoustic Signal Enhancement (IWAENC).

[10]  Ehud Weinstein,et al.  Signal enhancement using beamforming and nonstationarity with applications to speech , 2001, IEEE Trans. Signal Process..

[11]  Sharon Gannot,et al.  Adaptive Beamforming and Postfiltering , 2008 .

[12]  Reinhold Häb-Umbach,et al.  Source counting in speech mixtures by nonparametric Bayesian estimation of an infinite Gaussian mixture model , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[13]  Jingdong Chen,et al.  Microphone Array Signal Processing , 2008 .

[14]  Sharon Gannot,et al.  Towards a generalization of relative transfer functions to more than one source , 2015, 2015 23rd European Signal Processing Conference (EUSIPCO).

[15]  B.D. Van Veen,et al.  Beamforming: a versatile approach to spatial filtering , 1988, IEEE ASSP Magazine.

[16]  Israel Cohen,et al.  Multichannel Eigenspace Beamforming in a Reverberant Noisy Environment With Multiple Interfering Speech Signals , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[17]  Sharon Gannot,et al.  The Binaural LCMV Beamformer and its Performance Analysis , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.