论文信息 - Continuous-speech phone recognition from ultrasound and optical images of the tongue and lips

Continuous-speech phone recognition from ultrasound and optical images of the tongue and lips

The article describes a video-only speech recognition system for a “silent speech interface” application, using ultrasound and optical images of the voice organ. A one-hour audiovisual speech corpus was phonetically labeled using an automatic speech alignment procedure and robust visual feature extraction techniques. HMM-based stochastic models were estimated separately on the visual and acoustic corpus. The performance of the visual speech recognition system is compared to a traditional acoustic-based recognizer.

Gérard Chollet | Bruce Denby | Thomas Hueber | Gérard Dreyfus | Maureen Stone

[1] Jonathan G. Fiscus,et al. Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[2] M Stone,et al. A head and transducer support system for making ultrasound images of tongue/jaw movement. , 1995, The Journal of the Acoustical Society of America.

[3] O. Gerard,et al. Review of Myocardial Motion Estimation Methods from Optical Flow Tracking on Ultrasound Data , 2006, 2006 International Conference of the IEEE Engineering in Medicine and Biology Society.

[4] IEEE Recommended Practice for Speech Quality Measurements , 1969, IEEE Transactions on Audio and Electroacoustics.

[5] Bruce Denby,et al. Prospects for a Silent Speech Interface using Ultrasound Imaging , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[6] B. Efron. Nonparametric estimates of standard error: The jackknife, the bootstrap and other methods , 1981 .

[7] Gérard Chollet,et al. Eigentongue Feature Extraction for an Ultrasound-Based Silent Speech Interface , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[8] P. J. Green,et al. Probability and Statistical Inference , 1978 .

[9] Kiyohiro Shikano,et al. A tissue-conductive acoustic sensor applied in speech recognition for privacy , 2005, sOc-EUSAI '05.

[10] L. Maier-Hein,et al. Session independent non-audible speech recognition using surface electromyography , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..

[11] M. Kendall. Probability and Statistical Inference , 1956, Nature.