Wake-up-word detection for robots using spatial eigenspace consistency and resonant curve similarity

In this paper, we propose a method to detect the wake-up-word (WUW) using microphone array for human-robot interaction. The consistency of the spatial eigenspaces formed by the speech source at different frequencies and the resonant curve similarity of the WUW are used as the features for detection. These features are processed and detected separately and the result is determined by cascading individual outcome using Bayes risk detector. This proposed method can keep a high recognition rate under very low signal-to-noise ratio (SNR) conditions. In addition, this method can estimate the direction of arrivals of the sound source, and the proposed architecture is easy to expand by adding detectors with other features in the cascaded manner to further improve the recognition rate.

[1]  Jean Rouat,et al.  Enhanced Robot Speech Recognition Based on Microphone Array Source Separation and Missing Feature Theory , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[2]  Jean Rouat,et al.  Localization of simultaneous moving sound sources for mobile robot using a frequency- domain steered beamformer approach , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[3]  Andreas Spanias,et al.  Speech coding: a tutorial review , 1994, Proc. IEEE.

[4]  Richard M. Stern,et al.  Likelihood-maximizing beamforming for robust hands-free speech recognition , 2004, IEEE Transactions on Speech and Audio Processing.

[5]  France Mihelic,et al.  Speech/Non-Speech Segmentation Based on Phoneme Recognition Features , 2006, EURASIP J. Adv. Signal Process..

[6]  Veton Këpuska,et al.  Wake-up-word speech recognition application for first responder communication enhancement , 2006, SPIE Defense + Commercial Sensing.

[7]  Tetsuya Ogata,et al.  Upper-limit evaluation of robot audition based on ICA-BSS in multi-source, barge-in and highly reverberant conditions , 2010, 2010 IEEE International Conference on Robotics and Automation.

[8]  V. Kepuska,et al.  A novel Wake-Up-Word speech recognition system, Wake-Up-Word recognition task, technology and evaluation , 2009 .

[9]  Kiyohiro Shikano,et al.  Voice activity detection applied to hands-free spoken dialogue robot based on decoding using acoustic and language model , 2007, ROBOCOMM.

[10]  P. S. Krishnaprasad,et al.  Robot phonotaxis with dynamic sound-source localization , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[11]  Gunnar Fant,et al.  Acoustic Theory Of Speech Production , 1960 .

[12]  R. O. Schmidt,et al.  Multiple emitter location and signal Parameter estimation , 1986 .