Development of speechreading supplements based on automatic speech recognition

In manual-cued speech (MCS) a speaker produces hand gestures to resolve ambiguities among speech elements that are often confused by speechreaders. The shape of the hand distinguishes among consonants; the position of the hand relative to the face distinguishes among vowels. Experienced receivers of MCS achieve nearly perfect reception of everyday connected speech. MCS has been taught to very young deaf children and greatly facilitates language learning, communication, and general education. This manuscript describes a system that can produce a form of cued speech automatically in real time and reports on its evaluation by trained receivers of MCS. Cues are derived by a hidden markov models (HMM)-based speaker-dependent phonetic speech recognizer that uses context-dependent phone models and are presented visually by superimposing animated handshapes on the face of the talker. The benefit provided by these cues strongly depends on articulation of hand movements and on precise synchronization of the actions of the hands and the face. Using the system reported here, experienced cue receivers can recognize roughly two-thirds of the keywords in cued low-context sentences correctly, compared to roughly one-third by speechreading alone (SA). The practical significance of these improvements is to support fairly normal rates of reception of conversational speech, a task that is often difficult via SA.

[1]  I. Summers Tactile Aids for the Hearing Impaired , 1992 .

[2]  Steve Young,et al.  A review of large-vocabulary continuous-speech , 1996, IEEE Signal Process. Mag..

[3]  H. Kunov,et al.  Peripheral vision lipreading aid , 1991, IEEE Transactions on Biomedical Engineering.

[4]  Yumiko Fukuda,et al.  Proposal of a system of manual signs as an aid for Japanese lipreading , 1981 .

[5]  G. H. Nicholls,et al.  Cued Speech and the reception of spoken language. , 1982, Journal of speech and hearing research.

[6]  Matthew G. Sexton A video display system for an automatic cue generator , 1997 .

[7]  H W Upton,et al.  Wearable eyeglass speechreading aid. , 1968, American annals of the deaf.

[8]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[9]  Hynek Hermansky,et al.  RASTA processing of speech , 1994, IEEE Trans. Speech Audio Process..

[10]  Paul Duchnowski,et al.  A speechreading aid based on phonetic ASR , 1998, ICSLP.

[11]  Arthur Boothroyd,et al.  A sentence test of speech perception: reliability, set equivalence, and short term learning , 1985 .

[12]  M. J. Osberger,et al.  Speech recognition performance of older children with cochlear implants. , 1998, The American journal of otology.

[13]  Steve Austin,et al.  The forward-backward search algorithm , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[14]  Daniel Ling,et al.  The Effects of Using Cued Speech: A Follow-Up Study. , 1976 .

[15]  Lalit R. Bahl,et al.  A Maximum Likelihood Approach to Continuous Speech Recognition , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Louis D. Braida,et al.  Evaluating the articulation index for auditory-visual input. , 1987, The Journal of the Acoustical Society of America.

[17]  Alex Waibel,et al.  Tracking Human Faces in Real-Time, , 1995 .

[18]  Lorraine A. Delhorne,et al.  Current Results of a Field Study of Adult Users of Tactile Aids , 1995 .

[19]  P.C. Woodland,et al.  The 1994 HTK large vocabulary speech recognition system , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[20]  Cornett Ro Diagnostic factors bearing on the use of cued speech with hearing-impaired children , 1985 .

[21]  E. Klima The signs of language , 1979 .

[22]  Graeme M. Clark,et al.  Cochlear implants in the second and third millennia , 1998, ICSLP.

[23]  Jean E. Wandel Use of internal speech in reading by hearing and heraing-impaired students in Oral, Total Communication, and Cued Speech programs , 1989 .

[24]  L. Braida,et al.  Toward the Automatic Generation of Cued Speech , 1998 .

[25]  L. Baum,et al.  An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process , 1972 .

[26]  Steve Young,et al.  A review of large-vocabulary continuous-speech recognition , 1996 .

[27]  Vera Pless Q-codes , 1986, J. Comb. Theory, Ser. A.

[28]  W. Fisher,et al.  An acoustic‐phonetic data base , 1987 .

[29]  D Ling,et al.  Cued speech: an evaluative study. , 1975, American annals of the deaf.

[30]  Sadaoki Furui,et al.  Speaker-independent isolated word recognition using dynamic features of speech spectrum , 1986, IEEE Trans. Acoust. Speech Signal Process..

[31]  C M Reed,et al.  Automatic speech recognition to aid the hearing impaired: prospects for the automatic generation of cued speech. , 1994, Journal of rehabilitation research and development.

[32]  S. J. Young,et al.  Tree-based state tying for high accuracy acoustic modelling , 1994 .