Automatic speech recognition (ASR) and its use as a tool for assessment or therapy of voice, speech, and language disorders

In general opinion computerized automatic speech recognition (ASR) seems to be regarded as a method only to accomplish transcriptions from spoken language to written text and as such quite insecure and rather cumbersome. However, due to great advances in computer technology and informatics methodology ASR has nowadays become quite dependable and easier to handle, and the number of applications has increased considerably. After some introductory background information on ASR a number of applications of great interest for professionals in voice, speech, and language therapy are pointed out. In the foreseeable future, the keyboard and mouse will by means of ASR technology be replaced in many functions by a microphone as the human–computer interface, and the computer will talk back via its loud-speaker. It seems important that professionals engaged in the care of oral communication disorders take part in this development so their clients may get the optimal benefit from this new technology.

[1]  Georg Stemmer Modeling variability in speech recognition , 2004 .

[2]  P. Green,et al.  Automatic speech recognition and training for severely dysarthric users of assistive technology: The STARDUST project , 2006, Clinical linguistics & phonetics.

[3]  Alex Acero,et al.  Spoken Language Processing: A Guide to Theory, Algorithm and System Development , 2001 .

[4]  Jong Kyoung Kim,et al.  Speech recognition , 1983, 1983 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.

[5]  R. Cole,et al.  Survey of the State of the Art in Human Language Technology , 2010 .

[6]  Caroline Lyon,et al.  Speech-Based Real-Time Subtitling Services , 2004, Int. J. Speech Technol..

[7]  Elmar Nöth,et al.  Assessment of Non-Native Children ’ s Pronunciation : Human Marking and Automatic Scoring , .

[8]  Björn Granström,et al.  Speech technology for language training and e-inclusion , 2005, INTERSPEECH.

[9]  Elmar Nöth,et al.  Automatic scoring of the intelligibility in patients with cancer of the oral cavity , 2007, INTERSPEECH.

[10]  Florian Gallwitz,et al.  Integrated stochastic models for spontaneous speech recognition , 2002 .

[11]  Anne-Marie Öster,et al.  Computer-based speech therapy using visual feedback with focus on children with profound hearing impairments , 2006 .

[12]  F. Jelinek,et al.  Continuous speech recognition by statistical methods , 1976, Proceedings of the IEEE.

[13]  平野 実 Clinical examination of voice , 1981 .

[14]  Sadaoki Furui,et al.  Fifty years of progress in speech and speaker recognition , 2004 .

[15]  A. Nejat Ince,et al.  Digital Speech Processing , 1992 .

[16]  M. Hawley,et al.  Addressing the needs of speakers with longstanding dysarthria: computerized and traditional therapy compared. , 2007, International journal of language & communication disorders.

[17]  M. Schuster,et al.  Evaluation of speech intelligibility for children with cleft lip and palate by means of automatic speech recognition. , 2006, International journal of pediatric otorhinolaryngology.

[18]  Elmar Nöth,et al.  Caller: Computer Assisted Language Learning from Erlangen - Pronunciation Training and More , 2007 .

[19]  Helmer Strik,et al.  Feedback in computer assisted pronunciation training: technology push or demand pull? , 2002, INTERSPEECH.

[20]  Elmar Nöth,et al.  Towards robust automatic evaluation of pathologic telephone speech , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[21]  Tino Haderlein,et al.  Automatic evaluation of tracheoesophageal substitute voices , 2007 .

[22]  Sadaoki Furui,et al.  50 Years of Progress in Speech and Speaker Recognition Research , 1970 .

[23]  L. Boves,et al.  Quantitative assessment of second language learners' fluency: comparisons between read and spontaneous speech. , 2002, The Journal of the Acoustical Society of America.