Mouth gesture and voice command based robot command interface

In this paper we present a voice command and mouth gesture based robot command interface which is capable of controlling three degrees of freedom. The gesture set was designed in order to avoid head rotation and translation, and thus relying solely in mouth movements. Mouth segmentation is performed by using the normalized a* component, as in [1]. The gesture detection process is carried out by a Gaussian Mixture Model (GMM) based classifier. After that, a state machine stabilizes the system response by restricting the number of possible movements depending on the initial state. Voice commands are modeled using a Hidden Markov Model (HMM) isolated word recognition scheme. The interface was designed taking into account the specific pose restrictions found in the DaVinci Assisted Surgery command console.

[1]  Hervé Bourlard,et al.  Speech recognition with auxiliary information , 2004, IEEE Transactions on Speech and Audio Processing.

[2]  Diane Kewley-Port,et al.  Evaluation of speech recognizers for speech training applications , 1995, IEEE Trans. Speech Audio Process..

[3]  J.H.L. Hansen,et al.  Environmental sniffing: noise knowledge estimation for robust speech systems , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[4]  B R Lee,et al.  Laparoscopic visual field. Voice vs foot pedal interfaces for control of the AESOP robot. , 1998, Surgical endoscopy.

[5]  Roger Hsiao,et al.  Improving Reference Speaker Weighting Adaptation by the Use of Maximum-Likelihood Reference Speakers , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[6]  Chris Baber,et al.  Evaluating automatic speech recognition as a component of a multi-input device human-computer interface , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[7]  Liu Han-bing,et al.  Design of Keyword Recognition System over Telephone Channel Based on Multi-band Processing , 2007, 2007 2nd IEEE Conference on Industrial Electronics and Applications.

[8]  Tanneguy Redarce,et al.  Towards a mouth gesture based laparoscope camera command , 2008, 2008 International Workshop on Robotic and Sensors Environments.

[9]  Jing Huang,et al.  Automatic speech recognition performance on a voicemail transcription task , 2002, IEEE Trans. Speech Audio Process..

[10]  C. P. Gupta,et al.  Applications of Mathematics , 2007 .

[11]  Renato De Mori,et al.  Characterizing Feature Variability in Automatic Speech Recognition Systems , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[12]  Tanneguy Redarce,et al.  Lips Movement Segmentation and Features Extraction in Real Time , 2007 .

[13]  Robert G. Moore,et al.  Laparoscopic visual field , 1998, Surgical Endoscopy.

[14]  M. Sugisaka,et al.  Use of a cellular phone in mobile robot voice control , 2001, SICE 2001. Proceedings of the 40th SICE Annual Conference. International Session Papers (IEEE Cat. No.01TH8603).

[15]  Jing Huang,et al.  Audio-visual speech recognition using an infrared headset , 2004, Speech Commun..

[16]  Josep Amat,et al.  Automatic guidance of an assistant robot in laparoscopic surgery , 1996, Proceedings of IEEE International Conference on Robotics and Automation.

[17]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[18]  A. V. Nefian,et al.  Bayesian networks in multimodal speech recognition and speaker identification , 2003, The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003.

[19]  Tanneguy Redarce,et al.  Real-Time Robot Manipulation Using Mouth Gestures in Facial Video Sequences , 2007, BVAI.

[20]  James H. Martin,et al.  Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition , 2000 .