论文信息 - Classification of Pathological Speech Using Fusion of Multiple Subsystems

Classification of Pathological Speech Using Fusion of Multiple Subsystems

Pathological speech usually refers to the condition of spee ch distortion resulting from atypicalities in voice and/or in the articulatory mechanisms owing to disease, illness or other ph ysical or biological insult to the production system. While au tomatic evaluation of speech intelligibility and quality cou ld come in handy in these scenarios to assist in diagnosis and treatm ent design, the many sources and types of variability often make it a very challenging computational processing problem. In th is work we design multiple subsystems to address different aspects of pathological speech characteristics. These subsy tems are then fused at the binary hard score level (intelligible o r not intelligible) using Bayesian networks. Results show that s ubsystems, such as multiple language phoneme probability sys tem, prosodic and intonational subsystem, and voice qualit y and pronunciation subsystem, have discriminating power for in telligibility (9.8%, 17.1%, 14.6% higher than by-chance respe ctively). Noise-Majority based fusion shows 66.4% accuracy , but the performance improvement by fusion is not made. Also, voice clustering based joint classification is applied to mi ni ze misclassification of the best subsystem, and it shows the bes t classification accuracy (79.9% on dev set, 76.8% on test set) .

Naveen Kumar | Shrikanth S. Narayanan | Ming Li | Jangwon Kim | Andreas Tsiartas

[1] Kevin Murphy,et al. Bayes net toolbox for Matlab , 1999 .

[2] Pavel Matejka,et al. Hierarchical Structures of Neural Networks for Phoneme Recognition , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[3] Elmar Nöth,et al. PEAKS - A system for the automatic evaluation of voice and speech disorders , 2009, Speech Commun..

[4] Elmar Nöth,et al. The INTERSPEECH 2012 Speaker Trait Challenge , 2012, INTERSPEECH.

[5] Rehan Kazi,et al. Electroglottographic comparison of voice outcomes in patients with advanced laryngopharyngeal cancer treated by chemoradiotherapy or total laryngectomy. , 2008, International journal of radiation oncology, biology, physics.

[6] P. G. Pop,et al. Pathological voice assessment , 2007 .

[7] Paul Boersma,et al. Praat: doing phonetics by computer , 2003 .

[8] Shrikanth Narayanan,et al. Feature analysis for automatic detection of pathological speech , 2002, Proceedings of the Second Joint 24th Annual Conference and the Annual Fall Meeting of the Biomedical Engineering Society] [Engineering in Medicine and Biology.

[9] Chih-Jen Lin,et al. LIBSVM: A library for support vector machines , 2011, TIST.

[10] Jean-Pierre Martens,et al. DIA: a tool for objective intelligibility assessment of pathological speech , 2009, MAVEBA.

[11] Irene Jacobi,et al. Voice and speech outcomes of chemoradiation for advanced head and neck cancer: a systematic review , 2010, European Archives of Oto-Rhino-Laryngology.

[12] T.W. Berger,et al. Pathological Voice Assessment , 2006, 2006 International Conference of the IEEE Engineering in Medicine and Biology Society.

[13] Patrick Kenny,et al. Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[14] Douglas E. Sturim,et al. SVM Based Speaker Verification using a GMM Supervector Kernel and NAP Variability Compensation , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[15] Shrikanth S. Narayanan,et al. Automatic speaker age and gender recognition using acoustic and prosodic level information fusion , 2013, Comput. Speech Lang..

[16] Andreas Stolcke,et al. SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[17] Shrikanth S. Narayanan,et al. Strategies to Improve the Robustness of Agglomerative Hierarchical Clustering Under Data Source Variation for Speaker Diarization , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[18] Elmar Nöth,et al. Combining Phonological and Acoustic ASR-Free Features for Pathological Speech Intelligibility Assessment , 2011, INTERSPEECH.

[19] Ian H. Witten,et al. The WEKA data mining software: an update , 2009, SKDD.

[20] Elmar Nöth,et al. Automatic Speech Recognition Systems for the Evaluation of Voice and Speech Disorders in Head and Neck Cancer , 2010, EURASIP J. Audio Speech Music. Process..

[21] J. Martens,et al. Speech technology-based assessment of phoneme intelligibility in dysarthria. , 2009, International journal of language & communication disorders.