Multi-channel speaker localization and separation using a model-based GSC and an inertial measurement unit

In this paper we propose a novel multi-channel algorithm to separate simultaneous speakers in an environment where the microphone array is subject to movement. When the microphones are mounted to a person's head, for instance, the movements can lead to ambiguities with respect to the sources and to distortions in the processed signal. The proposed system estimates the direction-of-arrival of the speaker's signals relative to the array and updates these estimates using an inertial measurement unit (IMU). A GMM-based localization model is used to compute the posterior probabilities of source activity in each time-frequency bin and its parameters are re-estimated during array movements. Then, a model-based generalized side-lobe canceler (GSC) whose components are continuously updated, is employed for the separation of sources. For various speeds of microphone array rotation, it is demonstrated that the IMU-based system delivers improved speech quality when compared to the baseline technique without IMU.

[1]  Rainer Martin,et al.  A Versatile Framework for Speaker Separation Using a Model-Based Speaker Localization Approach , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Andries P. Hekstra,et al.  Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[3]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[4]  Shoko Araki,et al.  Equivalence between frequency domain blind source separation and frequency domain adaptive beamforming , 2002, ICASSP.

[5]  Shoko Araki,et al.  The fundamental limitation of frequency domain blind source separation for convolutive mixtures of speech , 2003, IEEE Trans. Speech Audio Process..

[6]  Julien Bourgeois,et al.  Time-Domain Beamforming and Blind Source Separation - Speech Input in the Car Environment , 2009, Lecture Notes in Electrical Engineering.

[7]  Nilesh Madhu A SCALABLE FRAMEWORK FOR MULTIPLE SPEAKER LOCALIZATION AND TRACKING , 2008 .

[8]  Israel Cohen,et al.  Multichannel Eigenspace Beamforming in a Reverberant Noisy Environment With Multiple Interfering Speech Signals , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  L. J. Griffiths,et al.  An alternative approach to linearly constrained adaptive beamforming , 1982 .

[10]  Darren B. Ward,et al.  Particle filtering algorithms for tracking an acoustic source in a reverberant environment , 2003, IEEE Trans. Speech Audio Process..

[11]  Walter Kellermann,et al.  TRINICON: a versatile framework for multichannel blind signal processing , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[13]  Ehud Weinstein,et al.  Signal enhancement using beamforming and nonstationarity with applications to speech , 2001, IEEE Trans. Signal Process..

[14]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[15]  Michael S. Brandstein,et al.  Robust Localization in Reverberant Rooms , 2001, Microphone Arrays.

[16]  Dietrich Klakow,et al.  Beamforming With a Maximum Negentropy Criterion , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[17]  O. Hoshuyama,et al.  A robust adaptive beamformer for microphone arrays with a blocking matrix using constrained adaptive filters , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[18]  Yesenia Lacouture-Parodi,et al.  Application of particle filtering to an interaural time difference based head tracker for crosstalk cancellation , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[19]  John W. McDonough,et al.  Adaptive Beamforming With a Minimum Mutual Information Criterion , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[20]  Rémi Gribonval,et al.  BSS_EVAL Toolbox User Guide -- Revision 2.0 , 2005 .