3D Joint Speaker Position and Orientation Tracking with Particle Filters

This paper addresses the problem of three-dimensional speaker orientation estimation in a smart-room environment equipped with microphone arrays. A Bayesian approach is proposed to jointly track the location and orientation of an active speaker. The main motivation is that the knowledge of the speaker orientation may yield an increased localization performance and vice versa. Assuming that the sound produced by the speaker is originated from his mouth, the center of the head is deduced based on the estimated head orientation. Moreover, the elevation angle of the head of the speaker can be partly inferred from the fast vertical movements of the computed mouth location. In order to test the performance of the proposed algorithm, a new multimodal dataset has been recorded for this purpose, where the corresponding 3D orientation angles are acquired by an inertial measurement unit (IMU) provided by accelerometers, magnetometers and gyroscopes in the three-axes. The proposed joint algorithm outperforms a two-step approach in terms of localization and orientation angle precision assessing the superiority of the joint approach.

[1]  Arun Ross,et al.  Score normalization in multimodal biometric systems , 2005, Pattern Recognit..

[2]  Climent Nadeu,et al.  Audio-based approaches to head orientation estimation in a smart-room , 2007, INTERSPEECH.

[3]  Rainer Stiefelhagen,et al.  Computers in the Human Interaction Loop , 2009, Human-Computer Interaction Series.

[4]  Harvey F. Silverman,et al.  A new algorithm for the estimation of talker azimuthal orientation using a large aperture microphone array , 2008, 2008 IEEE International Conference on Multimedia and Expo.

[5]  Arun Ross,et al.  Microphone Arrays , 2009, Encyclopedia of Biometrics.

[6]  Darren B. Ward,et al.  Particle filter beamforming for acoustic source localization in a reverberant environment , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  Carlos Segura,et al.  Multimodal Head Orientation Towards Attention Tracking in Smartrooms , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[8]  Timothy J. Robinson,et al.  Sequential Monte Carlo Methods in Practice , 2003 .

[9]  Climent Nadeu,et al.  Effect of head orientation on the speaker localization performance in smart-room environment , 2005, INTERSPEECH.

[10]  Niclas Bergman,et al.  Recursive Bayesian Estimation : Navigation and Tracking Applications , 1999 .

[11]  Darren B. Ward,et al.  Particle filtering algorithms for tracking an acoustic source in a reverberant environment , 2003, IEEE Trans. Speech Audio Process..

[12]  Maurizio Omologo,et al.  Acoustic event localization using a crosspower-spectrum phase based technique , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[13]  Maurizio Omologo,et al.  Acoustic source location in a three-dimensional space using crosspower spectrum phase , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[14]  Andrew Blake,et al.  Nonlinear filtering for speaker tracking in noisy and reverberant environments , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[15]  Harvey F. Silverman,et al.  A Robust Method to Extract Talker Azimuth Orientation Using a Large-Aperture Microphone Array , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[16]  Patrick Pérez,et al.  Sequential Monte Carlo Fusion of Sound and Vision for Speaker Tracking , 2001, ICCV.

[17]  Jun S. Liu,et al.  Sequential Monte Carlo methods for dynamic systems , 1997 .

[18]  Parham Aarabi,et al.  Enhanced sound localization , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[19]  Neil J. Gordon,et al.  A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking , 2002, IEEE Trans. Signal Process..

[20]  Carlos Segura Perales Speaker localization and orientation in multimodal smart environments , 2011 .

[21]  Peter Green,et al.  Markov chain Monte Carlo in Practice , 1996 .

[22]  Harvey F. Silverman,et al.  A baseline algorithm for estimating talker orientation using acoustical data from a large-aperture microphone array , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[23]  Rainer Stiefelhagen,et al.  Multiple Object Tracking Performance Metrics and Evaluation in a Smart Room Environment , 2006 .

[24]  Eric A. Lehmann,et al.  Particle filtering methods for acoustic source localisation and tracking , 2004 .

[25]  Ari Visa,et al.  Measurement Combination for Acoustic Source Localization in a Room Environment , 2008, EURASIP J. Audio Speech Music. Process..

[26]  Climent Nadeu,et al.  Speaker orientation estimation based on hybridation of GCC-PHAT and HLBR , 2008, INTERSPEECH.

[27]  Alessio Brutti,et al.  Oriented global coherence field for the estimation of the head orientation in smart rooms equipped with distributed microphone arrays , 2005, INTERSPEECH.

[28]  Alexander H. Waibel CHIL - Computers in the Human Interaction Loop , 2005, MVA.