Models for objective evaluation of dysarthric speech from data annotated by multiple listeners

In subjective evaluation of dysarthric speech, the inter-rater agreement between clinicians can be low. Disagreement among clinicians results from differences in their perceptual assessment abilities, familiarization with a client, clinical experiences, etc. Recently, there has been interest in developing signal processing and machine learning models for objective evaluation of subjective speech quality. In this paper, we propose a new method to address this problem by collecting subjective ratings from multiple evaluators and modeling the reliability of each annotator within a machine learning framework. In contrast to previous work, our model explicitly models the dependence of the speaker on an evaluators reliability. We evaluate the model on a series of experiments on a dysarthric speech database and show that our method outperforms other similar approaches.

[1]  Stephanie A Borrie,et al.  Familiarisation conditions and the mechanisms that underlie improved recognition of dysarthric speech , 2012, Language and cognitive processes.

[2]  Gerardo Hermosillo,et al.  Learning From Crowds , 2010, J. Mach. Learn. Res..

[3]  Visar Berisha,et al.  The relationship between perceptual disturbances in dysarthric speech and automatic speech recognition performance. , 2016, The Journal of the Acoustical Society of America.

[4]  Maxine Eskénazi,et al.  Speaking to the Crowd: Looking at Past Achievements in Using Crowdsourcing for Speech and Predicting Future Challenges , 2011, INTERSPEECH.

[5]  Shrikanth S. Narayanan,et al.  Primitives-based evaluation and estimation of emotions in speech , 2007, Speech Commun..

[6]  Rahul Gupta,et al.  A mixture of experts approach towards intelligibility classification of pathological speech , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7]  Satrajit S. Ghosh,et al.  Segment-dependent dynamics in predicting parkinson's disease , 2015, INTERSPEECH.

[8]  Patrick Olivier,et al.  Speeching: Mobile Crowdsourced Speech Assessment to Support Self-Monitoring and Management for People with Parkinson's , 2016, CHI.

[9]  J. Liss,et al.  Discriminating dysarthria type from envelope modulation spectra. , 2010, Journal of speech, language, and hearing research : JSLHR.

[10]  Mark W. Schmidt,et al.  Modeling annotator expertise: Learning when everybody knows a bit of something , 2010, AISTATS.

[11]  Brendan T. O'Connor,et al.  Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.

[12]  Sabine Buchholz,et al.  Crowdsourcing Preference Tests, and How to Detect Cheating , 2011, INTERSPEECH.

[13]  S. Spitzer,et al.  Quantifying speech rhythm abnormalities in the dysarthrias. , 2009, Journal of speech, language, and hearing research : JSLHR.

[14]  Visar Berisha,et al.  Objective assessment of pathological speech using distribution regression , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]  Thomas F. Quatieri,et al.  Vocal biomarkers of depression based on motor incoordination , 2013, AVEC@ACM Multimedia.

[16]  Philip Rose Forensic Speaker Identification , 2002 .

[17]  Panayiotis G. Georgiou,et al.  Accurate transcription of broadcast news speech using multiple noisy transcribers and unsupervised reliability metrics , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[18]  Raymond D. Kent,et al.  Listener agreement for auditory-perceptual ratings of dysarthria. , 2007, Journal of speech, language, and hearing research : JSLHR.

[19]  Carlos Busso,et al.  Tradeoff between quality and quantity of emotional annotations to characterize expressive behaviors , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).