Phonet: A Tool Based on Gated Recurrent Neural Networks to Extract Phonological Posteriors from Speech

There are a lot of features that can be extracted from speech signals for different applications such as automatic speech recognition or speaker verification. However, for pathological speech processing there is a need to extract features about the presence of the disease or the state of the patients that are comprehensible for clinical experts. Phonological posteriors are a group of features that can be interpretable by the clinicians and at the same time carry suitable information about the patient’s speech. This paper presents a tool to extract phonological posteriors directly from speech signals. The proposed method consists of a bank of parallel bidirectional recurrent neural networks to estimate the posterior probabilities of the occurrence of different phonological classes. The proposed models are able to detect the phonological classes with accuracies over 90%. In addition, the trained models are available to be used by the research community interested in the topic.

[1]  Miguelina Guirao,et al.  Frequency of Occurence of Phonemes in American Spanish , 2009 .

[2]  Abel Herrera Camacho,et al.  CIEMPIESS: A New Open-Sourced Mexican Spanish Radio Corpus , 2014, LREC.

[3]  Geoffrey E. Hinton,et al.  Learning a better representation of speech soundwaves using restricted boltzmann machines , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[5]  M. Laganaro,et al.  PAoS Markers: Trajectory Analysis of Selective Phonological Posteriors for Assessment of Progressive Apraxia of Speech , 2016 .

[6]  Visar Berisha,et al.  Interpretable phonological features for clinical applications , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7]  Frank Rudzicz,et al.  Classifying phonological categories in imagined and articulated speech , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8]  Florian Schiel,et al.  Multilingual processing of speech via web services , 2017, Comput. Speech Lang..

[9]  Florian Schiel,et al.  Automatic Phonetic Transcription of Non-Prompted Speech , 1999 .

[10]  Hermann Ney,et al.  LSTM, GRU, Highway and a Bit of Attention: An Empirical Overview for Language Modeling in Speech Recognition , 2016, INTERSPEECH.

[11]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[12]  Assumpció Rost Bagudanch Variation and phonological change: The case of yeísmo in Spanish , 2017 .

[13]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[14]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[15]  Rahul Gupta,et al.  Pathological speech processing: State-of-the-art, current challenges, and future directions , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[16]  J. C. Vásquez-Correa,et al.  Towards an automatic evaluation of the dysarthria level of patients with Parkinson's disease. , 2018, Journal of communication disorders.

[17]  Mark Liberman,et al.  Speaker identification on the SCOTUS corpus , 2008 .

[18]  Worldbet,et al.  ASCII Phonetic Symbols for the World s Languages Worldbet , 1994 .

[19]  Elmar Nöth,et al.  Characterisation of voice quality of Parkinson's disease using differential phonological posterior features , 2017, Comput. Speech Lang..

[20]  Milos Cernak,et al.  PhonVoc: A Phonetic and Phonological Vocoding Toolkit , 2016, INTERSPEECH.