Robotic Systems Service Robot SCORPIO with Robust Speech Interface Regular Paper

The SCORPIO is a small‐size mini‐teleoperator mobile service robot for booby‐trap disposal. It can be manually controlled by an operator through a portable briefcase remote control device using joystick, keyboard and buttons. In this paper, the speech interface is described. As an auxiliary function, the remote interface allows a human operator to concentrate sight and/or hands on other operation activities that are more important. The developed speech interface is based on HMM‐based acoustic models trained using the SpeechDatE‐SK database, a small‐vocabulary language model based on fixed connected words, grammar, and the speech recognition setup adapted for low‐resource devices. To improve the robustness of the speech interface in an outdoor environment, which is the working area of the SCORPIO service robot, a speech enhancement based on the spectral subtraction method, as well as a unique combination of an iterative approach and a modified LIMA framework, were researched, developed and tested on simulated and real outdoor recordings.

[1]  Alexander H. Waibel,et al.  Natural human-robot interaction using speech, head pose and gestures , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[2]  Philipos C. Loizou,et al.  Speech Enhancement: Theory and Practice , 2007 .

[3]  Jhing-Fa Wang,et al.  Noisy Environment-Aware Speech Enhancement for Speech Recognition in Human-Robot Interaction Application , 2010, 2010 IEEE International Conference on Systems, Man and Cybernetics.

[4]  Guo Li,et al.  Improved Voice Activity Detection Based on Iterative Spectral Subtraction and Double Thresholds for CVR , 2008, 2008 Workshop on Power Electronics and Intelligent Transportation System.

[5]  Jozef Juhár,et al.  Crosslingual and bilingual speech recognition with Slovak and Czech speechdat-e databases , 2005, INTERSPEECH.

[6]  Narada D. Warakagoda,et al.  The COST 249 SpeechDat Multilingual Reference Recogniser , 2000, LREC.

[7]  Richard M. Schwartz,et al.  Enhancement of speech corrupted by acoustic noise , 1979, ICASSP.

[8]  Sheng Li,et al.  Iterative spectral subtraction method for millimeter-wave conducted speech enhancement , 2010 .

[9]  Sridha Sridharan,et al.  A modified LIMA framework for spectral subtraction applied to in-car speech recognition , 2008 .

[10]  Rolf Dieter Schraft,et al.  Service robots : products, scenarios, visions , 2000 .

[11]  Jim-Min Lin,et al.  Software Integration for Applications with Audio Stream , 2008, 2008 International Conference on Intelligent Information Hiding and Multimedia Signal Processing.

[12]  Tetsuya Shimamura,et al.  Improved spectral subtraction utilizing iterative processing , 2007 .

[13]  Kiyohiro Shikano,et al.  Julius - an open source real-time large vocabulary recognition engine , 2001, INTERSPEECH.

[15]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[16]  T. Hasan,et al.  Iterative noise power subtraction technique for improved speech quality , 2008, 2008 International Conference on Electrical and Computer Engineering.

[17]  Hiroshi Saruwatari,et al.  Theoretical analysis of iterative weak spectral subtraction via higher-order statistics , 2010, 2010 IEEE International Workshop on Machine Learning for Signal Processing.

[18]  Khalid Choukri,et al.  SpeechDat(E) - Eastern European Telephone Speech Databases , 2000 .

[19]  Andrea Lockerd Thomaz,et al.  Effects of nonverbal communication on efficiency and robustness in human-robot teamwork , 2005, 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[20]  Arvin Agah,et al.  Human Robot Interaction Through Semantic Integration of Multiple Modalities, Dialog Management, and Contexts , 2009, Int. J. Soc. Robotics.

[21]  José Maria Azorín,et al.  Steps in the development of a robotic scrub nurse , 2012, Robotics Auton. Syst..

[22]  Oliver Prenzel Process model for the development of semi-autonomous service robots , 2009 .