Automatic Evaluation of Tracheoesophageal Telephone Speech

The tracheoesophageal (TE) substitute voice is currently s tate–of–the–art treatment to restore the ability to speak a fter laryngectomy. The intelligibility while talking over a telephone is an imp ortant clinical factor, as it is a crucial part of the patient s’ social life. An objective way to rate the intelligibility of substitute voi ces when talking over a telephone is desirable to improve the post–laryngectomy speech therapy. An automatic speech recognition (ASR) syst em was applied to 41 high quality recordings of post–larynge ctomy patients. The ASR system was trained with normal, non–pathologic spee ch. It yielded a word accuracy (WA) of 36.9% ±18.0%; compared to the intelligibility rating of a group of human experts the AS R system had a correlation coefficient of -.88. After downsam pling the 41 recordings to telephone quality, the ASR system reached a WA of 26.4%±13.9% leading to a correlation coefficient of -.80. These results confirm that an ASR system can be used for objective in telligibility rating over the telephone. Samodejna evalvacija traheoezofagalnega telefonskega go vora Traheoezofagalni nadomestni glas je trenutno najsodobnej ši ačin obnove sposobnosti govora po laringektomiji. Ra zumljivost pri telefonskem pogovoru je pomemben kliničen dejavnik, saj p redstavlja ključen del pacientove socialne interakcije. Za izboljšanje govorne terapije po laringektomiji je zaželen objektiven način o cenjevanja razumljivosti nadomestnih glasov pri telefons kem pogovoru. S sistemom za samodejno razpoznavanje govora (SRG) je bilo pr egledanih 41 visoko kakovostnih posnetkov pacientov po lar ingektomiji. Sistem SRG so učili z normalnim, nepatološkim govorom. Od stotek pravilno razpoznanih besed je bil 36,9% ±18,0%; v primerjavi z ocenami razumljivosti, ki jih je podala skupina strokovnja kov, je imel sistem SRG korelacijski koeficient -,88. Po zniˇ za ju frekvence vzorčenja 41 posnetkov na telefonsko kakovost je sistem SR G dosegel naslednji odstotek pravilno razpoznanih besed: 2 6,4%±13,9% oziroma korelacijski koeficient -,80. Ti rezultati potrjuj ejo, da je sistem SRG primeren za objektivno ocenjevanje raz umljivosti telefonskega govora.

[1]  Ernst Günter Schukat-Talamazzini,et al.  Automatische Spracherkennung - Grundlagen, statistische Modelle und effiziente Algorithmen , 1995, Künstliche Intelligenz.

[2]  Florian Gallwitz,et al.  Integrated stochastic models for spontaneous speech recognition , 2002 .

[3]  Elmar Nöth,et al.  Can you understand him? Let's look at his word accuracy-automatic evaluation of tracheoesophageal speech , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[4]  D V Cicchetti,et al.  Assessing Inter-Rater Reliability for Rating Scales: Resolving some Basic Issues , 1976, British Journal of Psychiatry.

[5]  J. Fleiss,et al.  Measuring Agreement for Multinomial Data , 1982 .

[6]  Elmar Nöth,et al.  Intelligibility of laryngectomees’ substitute speech: automatic speech recognition and subjective rating , 2005, European Archives of Oto-Rhino-Laryngology and Head & Neck.

[7]  M. Singer,et al.  A comparative acoustic study of normal, esophageal, and tracheoesophageal speech production. , 1984, The Journal of speech and hearing disorders.

[8]  H. Gilbert,et al.  An acoustic analysis of excellent female esophageal, tracheoesophageal, and laryngeal speakers. , 2001, Journal of speech, language, and hearing research : JSLHR.

[9]  H. K. Schutte,et al.  Aerodynamics of Esophageal Voice Production with and without a Groningen Voice Prosthesis , 2002, Folia Phoniatrica et Logopaedica.

[10]  Wolfgang Wahlster,et al.  Verbmobil: Foundations of Speech-to-Speech Translation , 2000, Artificial Intelligence.

[11]  Jonathan C. Irish,et al.  Postlaryngectomy Voice Rehabilitation: State of the Art at the Millennium , 2003, World Journal of Surgery.