Weak Speech Supervision: A case study of Dysarthria Severity Classification

Machine Learning methodologies are making a remarkable contribution, and yielding state-of-the-art results in different speech domains. With this exceptionally significant achievement, a large amount of labeled data is the largest bottleneck in the deployment of these speech systems. To generate massive data, hand-labeling training data is an intensively laborious task. This is problematic for clinical applications where obtaining such data labeled by speech pathologists is expensive and time-consuming. To overcome these problems, we introduce a new paradigm called Weak Speech Supervision (WSS), a first-of-its-kind system that helps users to train state-of-the-art classification models without hand-labeling training data. Users can write labeling functions (i.e., weak rules) to generate weak data from the unlabeled training set. In this paper, we provide the efficiency of this methodology via showing the case study of the severity-based binary classification of dysarthric speech. In WSS, we train a classifier on trusted data (labeled with 100% accuracy) via utilizing the weak data (labeled using weak supervision) to make our classifier model more efficient. Analysis of the proposed methodology is performed on Universal Access (UA) corpus. We got on an average 35.68% and 43.83% relative improvement in terms of accuracy and F1-score w.r.t. baselines, respectively.

[1]  Enrique Alfonseca,et al.  Pattern Learning for Relation Extraction with a Hierarchical Topic Model , 2012, ACL.

[2]  C Büchel,et al.  Brain regions involved in articulation , 1999, The Lancet.

[3]  Daniel Jurafsky,et al.  Distant supervision for relation extraction without labeled data , 2009, ACL.

[4]  Albert Fornells,et al.  A study of the effect of different types of noise on the precision of supervised learning techniques , 2010, Artificial Intelligence Review.

[5]  Inma Hernáez,et al.  Improved HNM-Based Vocoder for Statistical Synthesizers , 2011, INTERSPEECH.

[6]  Sebastian Ruder,et al.  An overview of gradient descent optimization algorithms , 2016, Vestnik komp'iuternykh i informatsionnykh tekhnologii.

[7]  Fraser Shein,et al.  Characterization of atypical vocal source excitation, temporal dynamics and prosody for objective measurement of dysarthric word intelligibility , 2012, Speech Commun..

[8]  Kwong-Sak Leung,et al.  A Survey of Crowdsourcing Systems , 2011, 2011 IEEE Third Int'l Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third Int'l Conference on Social Computing.

[9]  Andrew McCallum,et al.  Modeling Relations and Their Mentions without Labeled Text , 2010, ECML/PKDD.

[10]  Ingo R. Titze,et al.  Principles of voice production , 1994 .

[11]  Dan Klein,et al.  Learning from measurements in exponential families , 2009, ICML '09.

[12]  Chng Eng Siong,et al.  Severity-Based Adaptation with Limited Data for ASR to Aid Dysarthric Speakers , 2014, PloS one.

[13]  Blaine Nelson,et al.  Support Vector Machines Under Adversarial Label Noise , 2011, ACML.

[14]  Nagarajan Natarajan,et al.  Learning with Noisy Labels , 2013, NIPS.

[15]  Jun Zhang,et al.  Implementation of Training Convolutional Neural Networks , 2015, ArXiv.

[16]  Christopher Ré,et al.  The HoloClean Framework Dataset to be cleaned Denial Constraints External Information t 1 t 4 t 2 t 3 Johnnyo ’ s , 2017 .

[17]  Bozena Kostek,et al.  Interpretable Deep Learning Model for the Detection and Reconstruction of Dysarthric Speech , 2019, INTERSPEECH.

[18]  Christopher De Sa,et al.  DeepDive: Declarative Knowledge Base Construction , 2016, SGMD.

[19]  J. Ryalls,et al.  Intonation and speech rate in dysarthric speech. , 1994, Journal of communication disorders.

[20]  Sunil Kumar Kopparapu,et al.  Automatic assessment of dysarthria severity level using audio descriptors , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[21]  Seung Hak Lee,et al.  Assessment of Dysarthria Using One-Word Speech Recognition with Hidden Markov Models , 2019, Journal of Korean medical science.

[22]  Naomi Gurevich,et al.  Speech-Language Pathologists' Use of Intelligibility Measures in Adults With Dysarthria. , 2017, American journal of speech-language pathology.

[23]  Elmar Nöth,et al.  A Multitask Learning Approach to Assess the Dysarthria Severity in Patients with Parkinson's Disease , 2018, INTERSPEECH.

[24]  Gideon S. Mann,et al.  Generalized Expectation Criteria for Semi-Supervised Learning with Weakly Labeled Data , 2010, J. Mach. Learn. Res..

[25]  Yana Yunusova,et al.  Compensatory articulation in amyotrophic lateral sclerosis: Tongue and jaw in speech , 2013 .

[26]  Benjamin B. Bederson,et al.  Human computation: a survey and taxonomy of a growing field , 2011, CHI.

[27]  Heidi Christensen,et al.  A Framework for Collecting Realistic Recordings of Dysarthric Speech - the homeService Corpus , 2016, LREC.

[28]  Razvan C. Bunescu,et al.  Learning to Extract Relations from the Web using Minimal Supervision , 2007, ACL.

[29]  N. Sreedevi,et al.  Spectro-Temporal Representation of Speech for Intelligibility Assessment of Dysarthria , 2020, IEEE Journal of Selected Topics in Signal Processing.

[30]  Thomas S. Huang,et al.  Dysarthric speech database for universal access research , 2008, INTERSPEECH.

[31]  Sunil Kumar Kopparapu,et al.  Data Augmentation Using Healthy Speech for Dysarthric Speech Recognition , 2018, INTERSPEECH.

[32]  Myung Jong Kim,et al.  Dysarthric speech recognition using dysarthria-severity-dependent and speaker-adaptive models , 2013, INTERSPEECH.

[33]  Christopher Ré,et al.  Snorkel: Rapid Training Data Creation with Weak Supervision , 2017, Proc. VLDB Endow..

[34]  Tiago H. Falk,et al.  Automated Dysarthria Severity Classification for Improved Objective Intelligibility Assessment of Spastic Dysarthric Speech , 2012, INTERSPEECH.