Sound interval detection of multiple sources based on sound directivity

Utterance interval detection is a bottleneck for the current speech recognition performance in robots embedded in real noisy environments. In the present work, we make use of sound localization technology using a microphone array, not only for localizing, but also for detecting sound intervals of multiple sound sources. In our previous work we have implemented and evaluated sound localization in the 3D-space using the MUSIC (MUltiple SIgnal Classification) method. In the present work, we proposed a method for detecting sound intervals based on the sound directivity information inferred from the dynamics of the MUSIC spectrogram. The proposed method achieved high sound interval detection accuracies and low insertion rates compared with the previous sound localization results.

[1]  Martin Heckmann,et al.  Auditory Inspired Binaural Robust Sound Source Localization in Echoic and Noisy Environments , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[2]  Satoshi Nakamura,et al.  A Robust Speech Recognition System for Communication Robots in Noisy Environments , 2008, IEEE Transactions on Robotics.

[3]  Hiroshi Mizoguchi,et al.  Multiple Sound Source Mapping for a Mobile Robot by Self-motion Triangulation , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[4]  Takayuki Kanda,et al.  Who will be the customer?: a social robot that anticipates people's behavior from their trajectories , 2008, UbiComp.

[5]  Claude Sammut,et al.  Real time robot audition system incorporating both 3D sound source localisation and voice characterisation , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[6]  Hiroshi Ishiguro,et al.  Evaluation of a MUSIC-based real-time sound localization of multiple sound sources in real noisy environments , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[7]  Zhipeng Zhang,et al.  Noisy speech recognition based on robust end-point detection and model adaptation , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[8]  Martin Heckmann,et al.  Real-time Sound Localization With a Binaural Head-system Using a Biologically-inspired Cue-triple Mapping , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[9]  L. Rabiner,et al.  An algorithm for determining the endpoints of isolated utterances , 1974, The Bell System Technical Journal.

[10]  Hiroshi G. Okuno,et al.  Real-Time Tracking of Multiple Sound Sources by Integration of In-Room and Robot-Embedded Microphone Arrays , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[11]  Jean Rouat,et al.  Robust 3D Localization and Tracking of Sound Sources Using Beamforming and Particle Filtering , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[12]  Stefan Wermter,et al.  Bioinspired Auditory Sound Localisation for Improving the Signal to Noise Ratio of Socially Interactive Robots , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[13]  Masataka Goto,et al.  Real-time sound source localization and separation system and its application to automatic speech recognition , 2001, INTERSPEECH.

[14]  Alex Acero,et al.  Robust HMM-based endpoint detector , 1993, EUROSPEECH.

[15]  Patrick Danès,et al.  Broadband variations of the MUSIC high-resolution method for Sound Source Localization in Robotics , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.