Dereverberation for Speaker Identification in Meeting

Current state-of-the-art speaker identification is a well-established research problem but reverberation is still a major issue used in real meeting scenarios. Dereverberation is essential for many applications such as speaker identification and speech recognition to improve the quality and intelligibility of speech signal interrupted by real reverberation environments. The classical approaches were focused on estimating desired speech signal with dereverberation by beamforming which is crucial for hands-free distant-speech interaction. Its performance degradation is caused when beamforming equipment is unable to comply with the restriction of being symmetric in time or synchronous in structure under real condition. In this paper, a new de-reverberated merging feature is presented for text-independent speaker identification issue applied as an important component of Multiple Distance Microphones (MDM) system used in real meeting scenario. This scenario poses new challenges: farfield, limited and short training and test data, and almost severe reverberation. To tackle this, we introduce a dimensionality reduction approach to extract informative low-dimension features from four kinds of MDM-based features. Experimental results on the MDM system processed reverberated signal show the effectiveness of the new approach and the presented performance evaluation demonstrates the robustness and effectiveness of the proposed approach with short test utterances.

[1]  Emanuel A. P. Habets,et al.  A Two-Stage Beamforming Approach for Noise Reduction and Dereverberation , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  X. Anguera,et al.  Speaker diarization for multi-party meetings using acoustic fusion , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..

[3]  James H. Elder,et al.  Probabilistic Linear Discriminant Analysis for Inferences About Identity , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[4]  Bhiksha Raj,et al.  Techniques for Noise Robustness in Automatic Speech Recognition , 2012, Techniques for Noise Robustness in Automatic Speech Recognition.

[5]  Jingdong Chen,et al.  Time Difference of Arrival Estimation Exploiting Multichannel Spatio-Temporal Prediction , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  Jhing-Fa Wang,et al.  Unsupervised speaker change detection using SVM training misclassification rate , 2007, IEEE Transactions on Computers.

[7]  Eap Emanuël Habets Single- and multi-microphone speech dereverberation using spectral enhancement , 2007 .

[8]  Smriti Srivastava,et al.  GFM-Based Methods for Speaker Identification , 2013, IEEE Transactions on Cybernetics.

[9]  Xavier Anguera Miró,et al.  Speaker Diarization For Multiple-Distant-Microphone Meetings Using Several Sources of Information , 2007, IEEE Transactions on Computers.

[10]  Richard M. Stern,et al.  Maximum-likelihood-based cepstral inverse filtering for blind speech dereverberation , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[11]  E.A.P. Habets,et al.  Towards multi-microphone speech dereverberation using spectral enhancement and statistical reverberation models , 2008, 2008 42nd Asilomar Conference on Signals, Systems and Computers.

[12]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .