On the Development of an ASR-based Multimedia Game for Speech Therapy: Preliminary Results

A potential consequence of the ageing population is an increased incidence of neurological diseases that cause communicative disorders. In turn, this may lead to an increasing demand of intensive and costly speech therapy. To alleviate this problem, multimedia applications in the area of telerehabilitation and web-based speech training have been developed to support speech therapy. However, due to the repetitive nature of some exercises, therapy is not always perceived as particularly motivating. This paper reports on research aimed at developing a multimedia game that incorporates Automatic Speech Recognition (ASR) technology to provide patients autonomous and motivating practice without the intervention of a speech therapist. Currently, the game includes visual feedback on two dimensions of dysarthric speech that often deviate from healthy speech. To explore the possibility of integrating feedback on dysarthric speech by using ASR technology, initial experiments were conducted on available speech databases. The results show that employing ASR is becoming feasible thanks to recent developments in acoustic modelling.

[1]  H. Timothy Bunnell,et al.  STAR: articulation training for young children , 2000, INTERSPEECH.

[2]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[3]  Eva van Leer,et al.  Pervasive diagnosis and rehabilitation of voice disorders: current status and future directions , 2016, PervasiveHealth.

[4]  Emre Yilmaz,et al.  A Dutch Dysarthric Speech Database for Individualized Speech Therapy Research , 2016, LREC.

[5]  L. J. Peters,et al.  Telehealth: voice therapy using telecommunications technology. , 2003, American journal of speech-language pathology.

[6]  Panagiotis Chytas,et al.  Technology assisted speech and language therapy , 2004, Int. J. Medical Informatics.

[7]  Tara N. Sainath,et al.  Large vocabulary automatic speech recognition for children , 2015, INTERSPEECH.

[8]  L. Beijer E-learning based speech therapy (EST): exploring the potentials of e-health for dysarthric speakers , 2010 .

[9]  Ronald Meyer,et al.  A digital game to support voice treatment for parkinson's disease , 2013, CHI Extended Abstracts.

[10]  Chek Tien Tan,et al.  Visual feedback of acoustic data for speech therapy: model and design parameters , 2012, Audio Mostly Conference.

[11]  Oscar Saz-Torralba,et al.  Tools and Technologies for Computer-Aided Speech and Language Therapy , 2009, Speech Commun..

[12]  D. Theodoros,et al.  Telerehabilitation for service delivery in speech-language pathology , 2008, Journal of telemedicine and telecare.

[13]  R D McLeod,et al.  Fonetix: building virtual speech therapy practicum over the Internet. , 1999, Studies in health technology and informatics.

[14]  Henk van den Heuvel,et al.  E-learning-based speech therapy: a web application for speech training. , 2010, Telemedicine journal and e-health : the official journal of the American Telemedicine Association.

[15]  Kai Feng,et al.  The subspace Gaussian mixture model - A structured model for speech recognition , 2011, Comput. Speech Lang..

[16]  Raymond D. Kent,et al.  The dysarthrias: Speech-voice profiles, related dysfunctions, and neuropathology , 1998 .

[17]  Oscar Saz-Torralba,et al.  COMUNICA - tools for speech and language therapy , 2008, WOCCI.

[18]  Stephen Wilson,et al.  Treating disordered speech and voice in Parkinson's disease online: a randomized controlled non-inferiority trial. , 2011, International journal of language & communication disorders.

[19]  Klára Vicsi,et al.  A Multimedia, Multilingual Teaching and Training System for Children with Speech Disorders , 2000, Int. J. Speech Technol..

[20]  S. E. Hutchins SAY & SEE articulation therapy software , 1992, Proceedings of the Johns Hopkins National Search for Computing Applications to Assist Persons with Disabilities.

[21]  Karrie Karahalios,et al.  Vocsyl: visualizing syllable production for children with ASD and speech delays , 2010, ASSETS '10.

[22]  Nelleke Oostdijk,et al.  The Spoken Dutch Corpus. Overview and First Evaluation , 2000, LREC.

[23]  Kathrin Maria Gerling,et al.  Game Design for Older Adults: Effects of Age-Related Changes on Structural Elements of Digital Games , 2012, ICEC.

[24]  S. Countryman,et al.  Intensive voice treatment (LSVT®) for patients with Parkinson's disease: a 2 year follow up , 2001 .

[25]  Jun Gong,et al.  visiBabble for pre-speech feedback , 2006, CHI Extended Abstracts.

[26]  Dong Yu,et al.  Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[27]  Chek Tien Tan,et al.  sPeAK-MAN: towards popular gameplay for speech therapy , 2013, IE.

[28]  Geoffrey E. Hinton,et al.  Acoustic Modeling Using Deep Belief Networks , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[29]  Helmer Strik ASR-based systems for language learning and therapy , 2012 .

[30]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .

[31]  Hugo Van hamme,et al.  Recording Speech of Children, Non-Natives and Elderly People for HLT Applications: the JASMIN-CGN Corpus , 2008, LREC.

[32]  Sama'a Al Hashimi The role of paralinguistic voice-control of interactive media in augmenting awareness of voice characteristics in the hearing-impaired , 2007, CHI Extended Abstracts.

[33]  Karrie Karahalios,et al.  Visualizing vocal expression , 2014, CHI Extended Abstracts.

[34]  K R Coventry,et al.  Specialist speech and language therapists' use and evaluation of visual speech aids. , 1997, European journal of disorders of communication : the journal of the College of Speech and Language Therapists, London.