论文信息 - Speech-based localization of multiple persons for an interface robot

Speech-based localization of multiple persons for an interface robot

Robots are conveniently controlled by a human operator with spoken commands, since voice is a natural communication medium for humans. In order to successfully carry out a command, a robot needs to know which of the possibly many people gave the command and where this person is located. In this paper, we present a particle-filter based algorithm for localization of multiple speakers, in an environment where there is only one person speaking at a time. The algorithm incorporates person-specific voice features (vowel formant frequencies) in order to distinguish between the speakers. The voice features are supported by azimuth angle measurements obtained by a pair of microphones. We test our approach using the microphone system of the Philips iCat interface robot.

[1] Rong Chen,et al. A Theoretical Framework for Sequential Importance Sampling with Resampling , 2001, Sequential Monte Carlo Methods in Practice.

[2] Daniel P. W. Ellis. Computational auditory scene analysis exploiting speech-recognition knowledge , 1997, Proceedings of 1997 Workshop on Applications of Signal Processing to Audio and Acoustics.

[3] Bart G. de Grooth,et al. A simple model for Brownian motion leading to the Langevin equation , 1999 .

[4] Larry S. Davis,et al. Active speech source localization by a dual coarse-to-fine search , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[5] Sadaoki Furui,et al. A text-independent speaker recognition method robust against utterance variations , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[6] Michael S. Brandstein,et al. A practical methodology for speech source localization with microphone arrays , 1997, Comput. Speech Lang..

[7] Sebastian Lang,et al. Multi-modal anchoring for human-robot interaction , 2003, Robotics Auton. Syst..

[8] Roland Siegwart,et al. A navigation framework for multiple mobile robots and its application at the Expo.02 exhibition , 2003, 2003 IEEE International Conference on Robotics and Automation (Cat. No.03CH37422).

[9] Albert J. N. van Breemen,et al. Animation engine for believable interactive user-interface robots , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[10] Ea-Ee Jan,et al. Sound source localization in reverberant environments using an outlier elimination algorithm , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[11] Darren B. Ward,et al. Particle Filtering Algorithms for Acoustic Source Localization , 2003 .

[12] Greg Welch,et al. An Introduction to Kalman Filter , 1995, SIGGRAPH 2001.

[13] Ramani Duraiswami,et al. Accelerated speech source localization via a hierarchical search of steered response power , 2004, IEEE Transactions on Speech and Audio Processing.

[14] Geoffrey Zweig,et al. Speech Recognition with Dynamic Bayesian Networks , 1998, AAAI/IAAI.

[15] B. Delgutte,et al. Physiological measures of the precedence effect and spatial release from masking in the cat inferior colliculus , 2001 .

[16] S. Zoletnik,et al. Two-Point Correlation Measurements of Density Fluctuations in the W7-AS Stellarator , 2000 .

[17] Roland Siegwart,et al. A Navigation Framework for Multiple Mobile Robots and its Application , 2003 .

[18] Joelle Pineau,et al. Experiences with a mobile robotic guide for the elderly , 2002, AAAI/IAAI.

[19] Yoram Singer,et al. Discriminative Binaural Sound Localization , 2002, NIPS.

[20] Ramdas Kumaresan,et al. On decomposing speech into modulated components , 2000, IEEE Trans. Speech Audio Process..

[21] Sadaoki Furui,et al. Research of individuality features in speech waves and automatic speaker recognition techniques , 1986, Speech Commun..

[22] Gregory Dudek,et al. Probabilistic cooperative localization and mapping in practice , 2003, 2003 IEEE International Conference on Robotics and Automation (Cat. No.03CH37422).

[23] Frank Dellaert,et al. An MCMC-Based Particle Filter for Tracking Multiple Interacting Targets , 2004, ECCV.

[24] Amos Storkey,et al. Advances in Neural Information Processing Systems 20 , 2007 .

[25] G. E. Peterson,et al. Control Methods Used in a Study of the Vowels , 1951 .

[26] Yong Rui,et al. Real-time speaker tracking using particle filter sensor fusion , 2004, Proceedings of the IEEE.

[27] Javier Nicolás Sánchez,et al. Robust global localization using clustered particle filtering , 2002, AAAI/IAAI.

[28] Patrick Pérez,et al. Sequential Monte Carlo methods for multiple target tracking and data fusion , 2002, IEEE Trans. Signal Process..

[29] Simon J. Godsill,et al. On sequential simulation-based methods for Bayesian filtering , 1998 .

[30] Timothy J. Robinson,et al. Sequential Monte Carlo Methods in Practice , 2003 .

[31] Wolfram Burgard,et al. MINERVA: A Tour-Guide Robot that Learns , 1999, KI.

[32] M. S. Brandstein. A pitch-based approach to time-delay estimation of reverberant speech , 1997, Proceedings of 1997 Workshop on Applications of Signal Processing to Audio and Acoustics.

[33] Ben J. A. Kröse,et al. Jijo-2: An Office Robot that Communicates and Learns , 2001, IEEE Intell. Syst..

[34] Ian C. Bruce,et al. Robust Formant Tracking for Continuous Speech With Speaker Variability , 2003, IEEE Transactions on Audio, Speech, and Language Processing.

[35] Michael S. Brandstein,et al. A robust method for speech signal time-delay estimation in reverberant rooms , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[36] John H. L. Hansen,et al. Discrete-Time Processing of Speech Signals , 1993 .

[37] David Gerhard,et al. Pitch Extraction and Fundamental Frequency: History and Current Techniques , 2003 .

[38] Maurizio Omologo,et al. Use of the crosspower-spectrum phase in acoustic event location , 1997, IEEE Trans. Speech Audio Process..

[39] Hiroaki Kitano,et al. Real-time sound source localization and separation for robot audition , 2002, INTERSPEECH.

[40] Noboru Ohnishi,et al. Self-organization of a sound source localization robot by perceptual cycle , 2002, Proceedings of the 9th International Conference on Neural Information Processing, 2002. ICONIP '02..

[41] Ben Kröse,et al. A User-Interface Robot for Ambient Intelligent Environments , 2003 .

[42] Simon Maskell,et al. Fast mutual exclusion , 2004, SPIE Defense + Commercial Sensing.

[43] Zoubin Ghahramani,et al. Learning Dynamic Bayesian Networks , 1997, Summer School on Neural Networks.

[44] G. Carter,et al. The generalized correlation method for estimation of time delay , 1976 .

[45] A.K. Swain,et al. Estimation of LPC parameters of speech signals in noisy environment , 2004, 2004 IEEE Region 10 Conference TENCON 2004..

[46] Michael Isard,et al. CONDENSATION—Conditional Density Propagation for Visual Tracking , 1998, International Journal of Computer Vision.

[47] Y. Bar-Shalom. Tracking and data association , 1988 .

[48] Gregory D. Hager,et al. Joint probabilistic techniques for tracking objects using multiple visual cues , 1998, Proceedings. 1998 IEEE/RSJ International Conference on Intelligent Robots and Systems. Innovations in Theory, Practice and Applications (Cat. No.98CH36190).

[49] Maurizio Omologo,et al. Acoustic event localization using a crosspower-spectrum phase based technique , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[50] David Gerhard. Silence as a cue to rhythm in the analysis of speech and song , 2003 .

[51] Yong Rui,et al. New direct approaches to robust sound source localization , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[52] Maurizio Omologo,et al. Acoustic source location in a three-dimensional space using crosspower spectrum phase , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[53] Gradje KlaassenWojciech,et al. Speech-based localization ofmultiple persons foran interface robot , 2005 .

[54] Andrew Blake,et al. Nonlinear filtering for speaker tracking in noisy and reverberant environments , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[55] Jean Rouat,et al. Robust sound source localization using a microphone array on a mobile robot , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).

[56] Stanley T. Birchfield,et al. Acoustic source direction by hemisphere sampling , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[57] Maurizio Omologo,et al. Talker localization and speech enhancement in a noisy environment using a microphone array based acquisition system , 1993, EUROSPEECH.

[58] Wolfram Burgard,et al. People Tracking with Mobile Robots Using Sample-Based Joint Probabilistic Data Association Filters , 2003, Int. J. Robotics Res..

[59] Fredrik Gustafsson,et al. Monte Carlo data association for multiple target tracking , 2001 .

[60] Wolfram Burgard,et al. Tracking multiple moving targets with a mobile robot using particle filters and statistical data association , 2001, Proceedings 2001 ICRA. IEEE International Conference on Robotics and Automation (Cat. No.01CH37164).