Real-time speech recognition system for robotic control applications using an ear-microphone

Abstract : This study is part of an ongoing research started in 2004 at the Naval Postgraduate School (NPS) investigating the development of a human-machine interface command-and-control package for controlling robotic units in operational environments. An ear microphone is used to collect the voice-activated commands providing hands-free control instructions in noisy environments [Kurcan, 2006; Bulbuller, 2006]. This study presents the hardware implementation of a theoretical Isolated Word Recognition (IWR) system designed in an earlier study. The recognizer uses a short-term energy and zero-crossing based detection scheme, and a discrete Hidden Markov model recognizer designed to recognize seven isolated words. Mel frequency cepstrum coefficients (MFCC) are used for discriminating features in the recognizer phase. The hardware system implemented uses commercial off-the-shelf (COTS) electronic components, in-ear microphone, is portable and costs under $50.00. The implemented speech capturing system uses the ear-microphone and the Si3000 Audio Codec to capture and sample speech clearly. The microprocessor processes the detected speech in real-time. The microprocessor s I/O devices work effectively with the audio codec and computer for sampling and training, without communication problems or data loss. The current implementation uses 1.181 msec to process each 15 msec data frame. Resulting recognition performances average around 73.72%.

[1]  Michael Picheny,et al.  Some experiments with large-vocabulary isolated-word sentence recognition , 1984, ICASSP.

[2]  Claudio Becchetti,et al.  Speech Recognition: Theory and C++ Implementation , 1999 .

[3]  John H. L. Hansen,et al.  Discrete-Time Processing of Speech Signals , 1993 .

[4]  Prabhakar S. Naidu Modern Digital Signal Processing , 2003 .

[5]  A. Noll Cepstrum pitch determination. , 1967, The Journal of the Acoustical Society of America.

[6]  M.G. Bellanger,et al.  Digital processing of speech signals , 1980, Proceedings of the IEEE.

[7]  J. Markel Digital inverse filtering-a new tool for formant trajectory estimation , 1972 .

[8]  Jerry D. Gibson,et al.  Digital coding of waveforms: Principles and applications to speech and video , 1985, Proceedings of the IEEE.

[9]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[10]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[11]  Charles W. Therrien,et al.  Probability for electrical and computer engineers , 2004 .

[12]  Ravi Vaidyanathan,et al.  Parametric and non-parametric signal analysis for mapping air flow in the ear-canal to tongue movements: a new strategy for hands-free human-machine interfaces , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13]  D. Lindsley Physiological psychology. , 1956, Annual review of psychology.

[14]  T.H. Crystal,et al.  Linear prediction of speech , 1977, Proceedings of the IEEE.

[15]  Roger D. Quinn,et al.  Human-machine interface for tele-robotic operation: mapping of tongue movements based on aural flow monitoring , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[16]  B. Ripley,et al.  Pattern Recognition , 1968, Nature.

[17]  Zicheng Liu,et al.  Multi-sensory microphones for robust speech detection, enhancement and recognition , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.