Tracking of multiple moving speakers with multiple microphone arrays

In this paper we use multiple microphone arrays to fuse the location estimate from each microphone array which yields an improved estimate of the positions and velocities of multiple, simultaneously active, moving speakers based on time delay of arrivals (TDOAs). Our approach: 1) incorporates kinematic information of moving speakers by using an interacting multiple model (IMM) estimator for each speaker in order to constrain the evolution of the location measurements; 2) fuses the location estimates of the same speaker from multiple microphone arrays for better acoustical coverage of the sensed environment, and 3) directly accounts for the measurement origin uncertainty, i.e., which measurement comes from which speaker by using the probabilistic data association (PDA) technique with the IMM estimator. We demonstrate that a network of arrays combined with the estimation technique widely used in multisensor multitarget tracking area provides a consistent and coherent way to reduce the uncertainty and ambiguity of measurements. The effectiveness of our approach is illustrated by extensive simulation study on tracking a single moving speaker and two closely-spaced speakers with a crossing segment in their trajectories.

[1]  Michael S. Brandstein,et al.  Explicit Speech Modeling for Microphone Array Applications , 2001, Microphone Arrays.

[2]  Parham Aarabi,et al.  Self-localizing dynamic microphone arrays , 2002 .

[3]  Yaakov Bar-Shalom,et al.  Multitarget/Multisensor Tracking: Applications and Advances -- Volume III , 2000 .

[4]  Michael S. Brandstein,et al.  Robust Localization in Reverberant Rooms , 2001, Microphone Arrays.

[5]  Maja J. Mataric,et al.  A laser-based people tracker , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[6]  Universityof SouthernCalifornia LosAngeles Laser-based People Tracking , 2002 .

[7]  Walter Kellermann A self-steering digital microphone array , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[8]  G. Carter,et al.  The generalized correlation method for estimation of time delay , 1976 .

[9]  Michael Shapiro Brandstein,et al.  A framework for speech source localization using sensor arrays , 1995 .

[10]  Sascha Spors,et al.  Joint audio-video object localization and tracking , 2001 .

[11]  Andrew Blake,et al.  Nonlinear filtering for speaker tracking in noisy and reverberant environments , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[12]  S. R. Mahadeva Prasanna,et al.  Tracking a moving speaker using excitation source information , 2003, INTERSPEECH.

[13]  Y. Bar-Shalom,et al.  Topography-based VS-IMM estimator for large-scale ground target tracking , 1999 .

[14]  Steven A. Tretter,et al.  Optimum processing for delay-vector estimation in passive signal arrays , 1973, IEEE Trans. Inf. Theory.

[15]  Darren B. Ward,et al.  Particle filtering algorithms for tracking an acoustic source in a reverberant environment , 2003, IEEE Trans. Speech Audio Process..

[16]  Douglas E. Sturim,et al.  Tracking multiple talkers using microphone-array measurements , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[17]  K. C. Ho,et al.  A simple and efficient estimator for hyperbolic location , 1994, IEEE Trans. Signal Process..

[18]  Satoshi Nakamura,et al.  Distant-talking speech recognition based on a 3-D Viterbi search using a microphone array , 2002, IEEE Trans. Speech Audio Process..

[19]  S. R. Mahadeva Prasanna,et al.  Speaker localization using excitation source information in speech , 2005, IEEE Transactions on Speech and Audio Processing.

[20]  Maurizio Omologo,et al.  Environmental conditions and acoustic transduction in hands-free speech recognition , 1998, Speech Commun..

[21]  H. C. Schau,et al.  Passive source localization employing intersecting spherical surfaces from time-of-arrival differences , 1987, IEEE Trans. Acoust. Speech Signal Process..

[22]  James L. Flanagan,et al.  DSP implementation of source location using microphone arrays , 1996, Optics & Photonics.

[23]  Amir Averbuch,et al.  Interacting Multiple Model Methods in Target Tracking: A Survey , 1988 .

[24]  C. Chang,et al.  Kalman filter algorithms for a multi-sensor system , 1976, 1976 IEEE Conference on Decision and Control including the 15th Symposium on Adaptive Processes.

[25]  Michael S. Brandstein,et al.  A practical methodology for speech source localization with microphone arrays , 1997, Comput. Speech Lang..

[26]  Don H. Johnson,et al.  Array Signal Processing: Concepts and Techniques , 1993 .

[27]  Benoît Champagne,et al.  Cepstral prefiltering for time delay estimation in reverberant environments , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[28]  Krishna R. Pattipati,et al.  Ground-target tracking with topography-based variable-structure IMM estimator , 1998, Defense, Security, and Sensing.

[29]  Dieter Fox,et al.  Knowledge Compilation Properties of Trees-of-BDDs, Revisited , 2009, IJCAI.

[30]  Panayiotis G. Georgiou,et al.  Alpha-Stable Modeling of Noise and Robust Time-Delay Estimation in the Presence of Impulsive Noise , 1999, IEEE Trans. Multim..

[31]  Raffaele Parisi,et al.  Multi-Source Localization Strategies , 2001, Microphone Arrays.

[32]  Y. Bar-Shalom,et al.  The interacting multiple model algorithm for systems with Markovian switching coefficients , 1988 .

[33]  Jacob Benesty,et al.  Real-time passive source localization: a practical linear-correction least-squares approach , 2001, IEEE Trans. Speech Audio Process..

[34]  T. Kailath,et al.  Optimum localization of multiple sources by passive arrays , 1983 .

[35]  Parham Aarabi,et al.  Robust sound localization using multi-source audiovisual information fusion , 2001, Inf. Fusion.

[36]  Benesty,et al.  Adaptive eigenvalue decomposition algorithm for passive acoustic source localization , 2000, The Journal of the Acoustical Society of America.

[37]  Krishna R. Pattipati,et al.  Ground target tracking with variable structure IMM estimator , 2000, IEEE Trans. Aerosp. Electron. Syst..

[38]  Jont B. Allen,et al.  Image method for efficiently simulating small‐room acoustics , 1976 .