PEAKS - A system for the automatic evaluation of voice and speech disorders

We present a novel system for the automatic evaluation of speech and voice disorders. The system can be accessed via the internet platform-independently. The patient reads a text or names pictures. His or her speech is then analyzed by automatic speech recognition and prosodic analysis. For patients who had their larynx removed due to cancer and for children with cleft lip and palate we show that we can achieve significant correlations between the automatic analysis and the judgment of human experts in a leave-one-out experiment (p<.001). A correlation of .90 for the evaluation of the laryngectomees and .87 for the evaluation of the children's data was obtained. This is comparable to human inter-rater correlations.

[1]  Ian Witten,et al.  Data Mining , 2000 .

[2]  R Sader,et al.  Speech evaluation and swallowing ability after intra‐oral cancer , 2003, Clinical linguistics & phonetics.

[3]  H. Gilbert,et al.  An acoustic analysis of excellent female esophageal, tracheoesophageal, and laryngeal speakers. , 2001, Journal of speech, language, and hearing research : JSLHR.

[4]  S. M. Taylor,et al.  Fasciocutaneous flap reconstruction of the tongue and floor of mouth: outcomes and techniques. , 2002, Archives of otolaryngology--head & neck surgery.

[5]  P. Dejonckere,et al.  The intrajudge reliability of the perceptual rating of cleft palate speech before and after pharyngeal flap surgery: the effect of judges and speech samples. , 1999, The Cleft palate-craniofacial journal : official publication of the American Cleft Palate-Craniofacial Association.

[6]  R. Ruben,et al.  Redefining the survival of the fittest: communication disorders in the 21st century. , 1999, International journal of pediatric otorhinolaryngology.

[7]  N. McLean,et al.  An objective assessment of speech and swallowing following free flap reconstruction for oral cavity cancers. , 1996, British journal of plastic surgery.

[8]  Jonathan C. Irish,et al.  Postlaryngectomy Voice Rehabilitation: State of the Art at the Millennium , 2003, World Journal of Surgery.

[9]  Mark J. F. Gales,et al.  Variance compensation within the MLLR framework for robust speech recognition and speaker adaptation , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[10]  Emeka Nkenke,et al.  Evaluation of Speech Disorders in Children with Cleft Lip and Palate , 2005, Journal of Orofacial Orthopedics / Fortschritte der Kieferorthopädie.

[11]  J. Logemann,et al.  Surgical Variables Affecting Speech in Treated Patients With Oral and Oropharyngeal Cancer , 1998, The Laryngoscope.

[12]  Elmar Nöth,et al.  How to find trouble in communication , 2003, Speech Commun..

[13]  William H. Press,et al.  Numerical recipes in C , 2002 .

[14]  Elmar Nöth,et al.  The Prosodic Marking of Phrase Boundaries: Expectations and Results , 1995 .

[15]  Naresh Jha,et al.  Functional Outcomes After Primary Oropharyngeal Cancer Resection and Reconstruction With the Radial Forearm Free Flap , 2003, The Laryngoscope.

[16]  Elmar Nöth,et al.  Automatic evaluation of characteristic speech disorders in children with cleft lip and palate , 2008, INTERSPEECH.

[17]  Y. Takane,et al.  Generalized Inverse Matrices , 2011 .

[18]  Haruhiko Terai,et al.  Evaluation of speech intelligibility after a secondary dehiscence operation using an artificial graft in patients with speech disorders after partial glossectomy. , 2004, The British journal of oral & maxillofacial surgery.

[19]  H. K. Schutte,et al.  Aerodynamics of Esophageal Voice Production with and without a Groningen Voice Prosthesis , 2002, Folia Phoniatrica et Logopaedica.

[20]  Anto Zecevic Ein sprachgestütztes Trainingssystem zur Evaluierung der Nasalität , 2002 .

[21]  Gerhard Rettinger,et al.  The Current Understanding of Cleft Lip Malformations , 2002, Facial plastic surgery : FPS.

[22]  L. Richman,et al.  Different cleft conditions, facial appearance, and speech: relationship to psychological variables. , 2001, The Cleft palate-craniofacial journal : official publication of the American Cleft Palate-Craniofacial Association.

[23]  Florian Gallwitz,et al.  Integrated stochastic models for spontaneous speech recognition , 2002 .

[24]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[25]  Tara L. Whitehill,et al.  Consonant intelligibility and tongue motility in patients with partial glossectomy. , 2004, Journal of oral and maxillofacial surgery : official journal of the American Association of Oral and Maxillofacial Surgeons.

[26]  A. Harding,et al.  Active versus passive cleft-type speech characteristics. , 1998, International journal of language & communication disorders.

[27]  M. Lind,et al.  Free radial forearm flap reconstruction in surgery of the oral cavity and pharynx: surgical complications, impairment of speech and swallowing. , 1994, Clinical otolaryngology and allied sciences.

[28]  R. Schönweiler,et al.  Hörvermögen und Sprachleistungen bei 417 Kindern mit Spaltfehlbildungen , 1994 .

[29]  R. Jacob,et al.  Postglossectomy deglutitory and articulatory rehabilitation with palatal augmentation prostheses. , 1987, Archives of otolaryngology--head & neck surgery.

[30]  Elmar Nöth,et al.  Environmental Adaptation with a Small Data Set of the Target Domain , 2006, TSD.

[31]  Wan-Fu Su,et al.  Functional Comparison after Reconstruction with a Radial Forearm Free Flap or a Pectoralis Major Flap for Cancer of the Tongue , 2003, Otolaryngology--head and neck surgery : official journal of American Academy of Otolaryngology-Head and Neck Surgery.

[32]  Pierre Courrieu,et al.  Fast Computation of Moore-Penrose Inverse Matrices , 2008, ArXiv.

[33]  Elmar Nöth,et al.  The Prosody Module , 2006, SmartKom.

[34]  Paul C. Bagshaw,et al.  Enhanced pitch tracking and the processing of F0 contours for computer aided intonation teaching , 1993, EUROSPEECH.

[35]  Andreas Kießling,et al.  Extraktion und Klassifikation prosodischer Merkmale in der automatischen Sprachverarbeitung / Andreas Kiessling , 1997 .

[36]  M Ptok,et al.  A retrospective study of hearing, speech and language function in children with clefts following palatoplasty and veloplasty procedures at 18-24 months of age. , 1999, International journal of pediatric otorhinolaryngology.

[37]  Wolfgang Wahlster,et al.  Verbmobil: Foundations of Speech-to-Speech Translation , 2000, Artificial Intelligence.

[38]  David P Kuehn,et al.  Universal Parameters for Reporting Speech Outcomes in Individuals with Cleft Palate , 2008, The Cleft palate-craniofacial journal : official publication of the American Cleft Palate-Craniofacial Association.

[39]  Elmar Nöth,et al.  PROSODIC FEATURE EVALUATION: BRUTE FORCE OR WELL DESIGNED? , 1999 .

[40]  M. Singer,et al.  A comparative acoustic study of normal, esophageal, and tracheoesophageal speech production. , 1984, The Journal of speech and hearing disorders.

[41]  Elmar Nöth,et al.  Are You Looking at Me, Are You Talking with Me: Multimodal Classification of the Focus of Attention , 2006, TSD.

[42]  Elmar Nöth,et al.  Acoustic normalization of children's speech , 2003, INTERSPEECH.

[43]  A. Mäkitie,et al.  Changes in articulatory proficiency following microvascular reconstruction in oral or oropharyngeal cancer. , 2006, Oral oncology.

[44]  Elmar Nöth,et al.  We are not amused - but how do you know? user states in a multi-modal dialogue system , 2003, INTERSPEECH.

[45]  E. Vilkman,et al.  Speech articulation after subtotal glossectomy and reconstruction with a myocutaneous flap. , 1999, Acta oto-laryngologica.

[46]  P. Schönemann On artificial intelligence , 1985, Behavioral and Brain Sciences.

[47]  R Schönweiler,et al.  [Hearing capacity and speech production in 417 children with facial cleft abnormalities]. , 1994, HNO.

[48]  Frank Rosanowski,et al.  Phoniatric Aspects in Cleft Lip Patients , 2002, Facial plastic surgery : FPS.

[49]  Jacob Cohen,et al.  Applied multiple regression/correlation analysis for the behavioral sciences , 1979 .

[50]  A. Rademaker,et al.  Speech and swallowing in irradiated and nonirradiated postsurgical oral cancer patients. , 1998, Otolaryngology--head and neck surgery : official journal of American Academy of Otolaryngology-Head and Neck Surgery.

[51]  Heinrich Niemann,et al.  Automatic speech recognition without phonemes , 1993, EUROSPEECH.

[52]  Bernhard Schölkopf,et al.  Support vector learning , 1997 .

[53]  Elmar Nöth,et al.  Towards robust automatic evaluation of pathologic telephone speech , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[54]  James S. Brown,et al.  Functional outcome in soft palate reconstruction using a radial forearm free flap in conjunction with a superiorly based pharyngeal flap , 1997, Head & neck.

[55]  Elmar Nöth,et al.  Boiling down prosody for the classification of boundaries and accents in German and English , 2001, INTERSPEECH.

[56]  Richard Huber Prosodisch-linguistische Klassifikation von Emotion , 2001 .

[57]  D R Beukelman,et al.  Obturator prostheses after cancer surgery: an approach to speech outcome assessment. , 1998, The Journal of prosthetic dentistry.

[58]  M. Latorre,et al.  Speech intelligibility after glossectomy and speech rehabilitation. , 2001, Archives of otolaryngology--head & neck surgery.

[59]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[60]  Georg Stemmer Modeling variability in speech recognition , 2004 .

[61]  M Ptok,et al.  [Normal nasalance for the German language. Nasometric values for clinical use in patients with cleft lip and palate]. , 2003, HNO.

[62]  Elmar Nöth,et al.  Multimodal User State Recognition in a Modern Dialogue System , 2003, KI.