Continuous-speech phone recognition from ultrasound and optical images of the tongue and lips

The article describes a video-only speech recognition system for a “silent speech interface” application, using ultrasound and optical images of the voice organ. A one-hour audiovisual speech corpus was phonetically labeled using an automatic speech alignment procedure and robust visual feature extraction techniques. HMM-based stochastic models were estimated separately on the visual and acoustic corpus. The performance of the visual speech recognition system is compared to a traditional acoustic-based recognizer.

[1]  Jonathan G. Fiscus,et al.  Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[2]  M Stone,et al.  A head and transducer support system for making ultrasound images of tongue/jaw movement. , 1995, The Journal of the Acoustical Society of America.

[3]  O. Gerard,et al.  Review of Myocardial Motion Estimation Methods from Optical Flow Tracking on Ultrasound Data , 2006, 2006 International Conference of the IEEE Engineering in Medicine and Biology Society.

[4]  IEEE Recommended Practice for Speech Quality Measurements , 1969, IEEE Transactions on Audio and Electroacoustics.

[5]  Bruce Denby,et al.  Prospects for a Silent Speech Interface using Ultrasound Imaging , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[6]  B. Efron Nonparametric estimates of standard error: The jackknife, the bootstrap and other methods , 1981 .

[7]  Gérard Chollet,et al.  Eigentongue Feature Extraction for an Ultrasound-Based Silent Speech Interface , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[8]  P. J. Green,et al.  Probability and Statistical Inference , 1978 .

[9]  Kiyohiro Shikano,et al.  A tissue-conductive acoustic sensor applied in speech recognition for privacy , 2005, sOc-EUSAI '05.

[10]  L. Maier-Hein,et al.  Session independent non-audible speech recognition using surface electromyography , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..

[11]  M. Kendall Probability and Statistical Inference , 1956, Nature.