Interpretable phonological features for clinical applications

Instrumental analysis of speech sometimes complements subjective evaluations in speech and language therapy; however, apart from elemental speech features such as pitch and formant statistics, higher dimensional spectral features are rarely used in practice because they are clinically uninterpretable. While these features are likely to somehow be related to clinical intervention, this relationship remains to be determined. This paper uses artificial recurrent neural networks to map high-dimensional spectral features into phonological features that are easily interpretable and provide fine-resolution information regarding articulation quality. The evaluation on a dysarthric speech data set shows strong correlation between the phonological feature measures and perceptual ratings. To increase clinical utility, we provide a new way to visualize phonological disturbances that provides clinicians with actionable information about intervention strategies.

[1]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[2]  Jonathan G. Fiscus,et al.  Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[3]  Jean-Claude Junqua,et al.  Robustness in Automatic Speech Recognition , 1996 .

[4]  Björn W. Schuller,et al.  Single-channel speech separation with memory-enhanced recurrent neural networks , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  Tom Brøndsted,et al.  A SPE based distinctive feature composition of the CMU Label Set in the TIMIT database , 1999 .

[7]  Mark Liberman,et al.  Speaker identification on the SCOTUS corpus , 2008 .

[8]  Steve McLaughlin,et al.  IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP' 07) , 2007 .

[9]  Visar Berisha,et al.  Online speaking rate estimation using recurrent neural networks , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[10]  Visar Berisha,et al.  Accent Identification by Combining Deep Neural Networks and Recurrent Neural Networks Trained on Long and Short Term Features , 2016, INTERSPEECH.

[11]  M. Halle,et al.  Preliminaries to Speech Analysis: The Distinctive Features and Their Correlates , 1961 .

[12]  Karthikeyan Natesan Ramamurthy,et al.  Removing data with noisy responses in regression analysis , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[13]  Heiga Zen,et al.  Unidirectional long short-term memory recurrent neural network with recurrent output layer for low-latency speech synthesis , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[14]  P. Mahalanobis On the generalized distance in statistics , 1936 .

[15]  Levent M. Arslan,et al.  Voice conversion by codebook mapping of line spectral frequencies and excitation spectrum , 1997, EUROSPEECH.

[16]  Noam Chomsky,et al.  The Sound Pattern of English , 1968 .

[17]  Björn W. Schuller,et al.  Introducing CURRENNT: the munich open-source CUDA recurrent neural network toolkit , 2015, J. Mach. Learn. Res..

[18]  Thomas P. Barnwell,et al.  MCCREE AND BARNWELL MIXED EXCITAmON LPC VOCODER MODEL LPC SYNTHESIS FILTER 243 SYNTHESIZED SPEECH-PERIODIC PULSE TRAIN-1 PERIODIC POSITION JITTER PULSE 4 , 2004 .

[19]  Paul Taylor,et al.  Text-to-Speech Synthesis , 2009 .

[20]  Milos Cernak,et al.  Phonological vocoding using artificial neural networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[21]  S. Spitzer,et al.  Quantifying speech rhythm abnormalities in the dysarthrias. , 2009, Journal of speech, language, and hearing research : JSLHR.

[22]  Milos Cernak,et al.  On compressibility of neural network phonological features for low bit rate speech coding , 2015, INTERSPEECH.

[23]  Jean-Pierre Martens,et al.  Automated Intelligibility Assessment of Pathological Speech Using Phonological Features , 2009, EURASIP J. Adv. Signal Process..

[24]  Shrikanth S. Narayanan,et al.  Primitives-based evaluation and estimation of emotions in speech , 2007, Speech Commun..

[25]  Martin Karafiát,et al.  Convolutive Bottleneck Network features for LVCSR , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.

[26]  Gina-Anne Levow,et al.  Analysis of Dysarthric Speech using Distinctive Feature Recognition , 2015, SLPAT@Interspeech.

[27]  Shrikanth Narayanan,et al.  Feature analysis for automatic detection of pathological speech , 2002, Proceedings of the Second Joint 24th Annual Conference and the Annual Fall Meeting of the Biomedical Engineering Society] [Engineering in Medicine and Biology.

[28]  Helen M. Meng,et al.  Exploring articulatory characteristics of Cantonese dysarthric speech using distinctive features , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[29]  Carla Teixeira Lopes,et al.  TIMIT Acoustic-Phonetic Continuous Speech Corpus , 2012 .

[30]  Bishnu S. Atal,et al.  A new model of LPC excitation for producing natural-sounding speech at low bit rates , 1982, ICASSP.

[31]  Raymond D. Kent,et al.  Toward an acoustic typology of motor speech disorders , 2003, Clinical linguistics & phonetics.

[32]  Geoffrey E. Hinton,et al.  Learning a better representation of speech soundwaves using restricted boltzmann machines , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[33]  Simon King,et al.  Detection of phonological features in continuous speech using neural networks , 2000, Comput. Speech Lang..

[34]  Karthikeyan Umapathy,et al.  Feature analysis of pathological speech signals using local discriminant bases technique , 2006, Medical and Biological Engineering and Computing.

[35]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.