Prospects for a Silent Speech Interface using Ultrasound Imaging

The feasibility of a silent speech interface using ultrasound (US) imaging and lip profile video is investigated by examining the quality of line spectral frequencies (LSF) derived from the image sequences. It is found that the data do not at present allow reliable identification of silences and fricatives, but that LSF's recovered from vocalized passages are compatible with the synthesis of intelligible speech

[1]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[2]  D. D. Lee,et al.  Sub auditory speech recognition based on EMG signals , 2003, Proceedings of the International Joint Conference on Neural Networks, 2003..

[3]  Gérard Dreyfus,et al.  Local Overfitting Control via Leverages , 2002, Neural Computation.

[4]  Christian Abry,et al.  Articulatory synthesis from X-rays and inversion for an adaptive speech robot , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[5]  Bruce Denby,et al.  Speech synthesis from real time ultrasound images of the tongue , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  M Stone,et al.  A head and transducer support system for making ultrasound images of tongue/jaw movement. , 1995, The Journal of the Acoustical Society of America.

[7]  Gérard Dreyfus,et al.  Neural networks - methodology and applications , 2005 .

[8]  L. Fransen,et al.  Application of line-spectrum pairs to low-bit-rate speech encoders , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Rajiv Laroia,et al.  Robust and efficient quantization of speech LSP parameters using structured vector quantizers , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[10]  Sheng Chen,et al.  Sparse modeling using orthogonal forward regression with PRESS statistic and regularization , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[11]  Gérard Bailly,et al.  Three-dimensional linear articulatory modeling of tongue, lips and face, based on MRI and video images , 2002, J. Phonetics.

[12]  Frank H. P. Fitzek,et al.  Voice quality evaluation in wireless packet communication systems: a tutorial and performance results for RHC , 2005, IEEE Wireless Communications.

[13]  Chuck,et al.  Sub Auditory Speech Recognition based on EMG/EPG Signals , 2022 .

[14]  Gérard Dreyfus,et al.  Ranking a Random Feature for Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[15]  Olov Engwall Synthesizing static vowels and dynamic sounds using a 3D vocal tract model , 2001, SSW.