Acoustic Assessment of Disordered Voice with Continuous Speech Based on Utterance-Level ASR Posterior Features

Most previous studies on acoustic assessment of disordered voice were focused on extracting perturbation features from isolated vowels produced with steady-state phonation. Natural speech, however, is considered to be more preferable in the aspects of flexibility, effectiveness and reliability for clinical practice. This paper presents an investigation on applying automatic speech recognition (ASR) technology to disordered voice assessment of Cantonese speakers. A DNN-based ASR system is trained using phonetically-rich continuous utterances from normal speakers. It was found that frame-level phone posteriors obtained from the ASR system are strongly correlated with the severity level of voice disorder. Phone posteriors in utterances with severe disorder exhibit significantly larger variation than those with mild disorder. A set of utterance-level posterior features are computed to quantify such variation for pattern recognition purpose. An SVM based classifier is used to classify an input utterance into the categories of mild, moderate and severe disorder. The two-class classification accuracy for mild and severe disorders is 90.3%, and significant confusion between mild and moderate disorders is observed. For some of the subjects with severe voice disorder, the classification results are highly inconsistent among individual utterances. Furthermore, short utterances tend to have more classification errors.

[1]  Haizhou Li,et al.  Combining multiple kernel models for automatic intelligibility detection of pathological speech , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Tan Lee,et al.  Spoken language resources for Cantonese speech processing , 2002, Speech Commun..

[3]  M. Brewer,et al.  Research Design and Issues of Validity , 2000 .

[4]  Brigitte Bigi,et al.  Automatic word segmentation for spoken Cantonese , 2015, 2015 International Conference Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE).

[5]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .

[6]  Rahul Gupta,et al.  Pathological speech processing: State-of-the-art, current challenges, and future directions , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7]  Yuanyuan Liu,et al.  Predicting Severity of Voice Disorder from DNN-HMM Acoustic Posteriors , 2016, INTERSPEECH.

[8]  Thomas Law,et al.  Comparison of Rater's reliability on perceptual evaluation of different types of voice sample. , 2012, Journal of voice : official journal of the Voice Foundation.

[9]  Rahul Gupta,et al.  Automatic estimation of parkinson's disease severity from diverse speech tasks , 2015, INTERSPEECH.

[10]  Definitions of communication disorders and variations. Ad Hoc Committee on Service Delivery in the Schools. American Speech-Language-Hearing Association. , 1993, ASHA. Supplement.

[11]  Helen Meng,et al.  CANTONESE SPEECH RECOGNITION AND SYNTHESIS , 2006 .

[12]  Naveen Kumar,et al.  Automatic intelligibility classification of sentence-level pathological speech , 2015, Comput. Speech Lang..