An Objective Evaluation Framework for Pathological Speech Synthesis

The development of pathological speech systems is currently hindered by the lack of a standardised objective evaluation framework. In this work, (1) we utilise existing detection and analysis techniques to propose a general framework for the consistent evaluation of synthetic pathological speech. This framework evaluates the voice quality and the intelligibility aspects of speech and is shown to be complementary using our experiments. (2) Using our proposed evaluation framework, we develop and test a dysarthric voice conversion system (VC) using CycleGAN-VC and a PSOLA-based speech rate modification technique. We show that the developed system is able to synthesise dysarthric speech with different levels of speech intelligibility.

[1]  Daniel P. W. Ellis,et al.  librosa/librosa: 0.6.0 , 2018 .

[2]  Yiming Wang,et al.  Purely Sequence-Trained Neural Networks for ASR Based on Lattice-Free MMI , 2016, INTERSPEECH.

[3]  Eric Moulines,et al.  Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones , 1989, Speech Commun..

[4]  Minhwa Chung,et al.  Improving Dysarthric Speech Intelligibility Using Cycle-consistent Adversarial Training , 2020, BIOSIGNALS.

[5]  D. Osoba,et al.  Quality of life and oral function following radiotherapy for head and neck cancer , 1999, Head & neck.

[6]  Odette Scharenborg,et al.  Detecting and analysing spontaneous oral cancer speech in the wild , 2020, INTERSPEECH.

[7]  Guillem Quer Romeo,et al.  PROBABILISTIC SYMBOL SEQUENCE MATCHING AND ITS APPLICATION TO PATHOLOGICAL SPEECH INTELLIGIBILITY ASSESSMENT , 2021 .

[8]  Ina Kodrasi,et al.  Pathological Speech Intelligibility Assessment Based on the Short-time Objective Intelligibility Measure , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  Rahul Gupta,et al.  Pathological speech processing: State-of-the-art, current challenges, and future directions , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[10]  Sunil Kumar Kopparapu,et al.  Improving Recognition of Dysarthric Speech Using Severity Based Tempo Adaptation , 2016, SPECOM.

[11]  Satoshi Nakamura,et al.  Listening while speaking: Speech chain by deep learning , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).

[12]  Zhizheng Wu,et al.  The Blizzard Challenge 2019 , 2019 .

[13]  Richard V. Cox,et al.  A very low bit rate speech coder based on a recognition/synthesis paradigm , 2001, IEEE Trans. Speech Audio Process..

[14]  Jon Barker,et al.  Phonetic Analysis of Dysarthric Speech Tempo and Applications to Robust Personalised Dysarthric Speech Recognition , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]  Frank Rudzicz Adjusting dysarthric speech signals to be more intelligible , 2013, Comput. Speech Lang..

[16]  Keiichi Tokuda,et al.  A very low bit rate speech coder using HMM-based speech recognition/synthesis techniques , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[17]  Hemant A. Patil,et al.  Intelligibility Improvement of Dysarthric Speech using MMSE DiscoGAN , 2020, 2020 International Conference on Signal Processing and Communications (SPCOM).

[18]  A. Goberman,et al.  Long-time average spectrum in individuals with Parkinson disease. , 2014, NeuroRehabilitation.

[19]  Thomas S. Huang,et al.  Dysarthric speech database for universal access research , 2008, INTERSPEECH.

[20]  Simon King,et al.  Measuring the Gap Between HMM-Based ASR and TTS , 2010, IEEE Journal of Selected Topics in Signal Processing.

[21]  Kai Feng,et al.  The subspace Gaussian mixture model - A structured model for speech recognition , 2011, Comput. Speech Lang..

[22]  Paul Boersma,et al.  Praat: doing phonetics by computer , 2003 .

[23]  Sunil Kumar Kopparapu,et al.  Data Augmentation Using Healthy Speech for Dysarthric Speech Recognition , 2018, INTERSPEECH.

[24]  N. D. De Biase,et al.  [The long-term average spectrum in research and in the clinical practice of speech therapists]. , 2006, Pro-fono : revista de atualizacao cientifica.

[25]  Hirokazu Kameoka,et al.  CycleGAN-VC: Non-parallel Voice Conversion Using Cycle-Consistent Adversarial Networks , 2018, 2018 26th European Signal Processing Conference (EUSIPCO).

[26]  拓海 杉山,et al.  “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[27]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .