Machine learning techniques for semantic analysis of dysarthric speech: An experimental study

Abstract We present an experimental comparison of seven state-of-the-art machine learning algorithms for the task of semantic analysis of spoken input, with a special emphasis on applications for dysarthric speech. Dysarthria is a motor speech disorder, which is characterized by poor articulation of phonemes. In order to cater for these non-canonical phoneme realizations, we employed an unsupervised learning approach to estimate the acoustic models for speech recognition, which does not require a literal transcription of the training data. Even for the subsequent task of semantic analysis, only weak supervision is employed, whereby the training utterance is accompanied by a semantic label only, rather than a literal transcription. Results on two databases, one of them containing dysarthric speech, are presented showing that Markov logic networks and conditional random fields substantially outperform other machine learning approaches. Markov logic networks have proved to be especially robust to recognition errors, which are caused by imprecise articulation in dysarthric speech.

[1]  Murat Can Ganiz,et al.  Evaluation of classification models for language processing , 2015, 2015 International Symposium on Innovations in Intelligent SysTems and Applications (INISTA).

[2]  Reinhold Häb-Umbach,et al.  Semantic analysis of spoken input using Markov logic networks , 2015, INTERSPEECH.

[3]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[4]  Walter Daelemans,et al.  Metadata for Corpora PATCOR and Domotica-2 , 2013 .

[5]  Julie M. Liss,et al.  A Cognitive-Perceptual Approach to Conceptualizing Speech Intelligibility Deficits and Remediation Practice in Hypokinetic Dysarthria , 2011, Parkinson's disease.

[6]  Gökhan Tür,et al.  Joint Discriminative Decoding of Words and Semantic Tags for Spoken Language Understanding , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  Eric Sanders,et al.  Automatic Recognition Of Dutch Dysarthric Speech, A Pilot Study , 2002 .

[8]  Bhiksha Raj,et al.  A hierarchical system for word discovery exploiting DTW-based initialization , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[9]  Matthew Richardson,et al.  Markov logic networks , 2006, Machine Learning.

[10]  Jerome R. Bellegarda,et al.  State of the art in statistical methods for language and speech processing , 2016, Comput. Speech Lang..

[11]  Hugo Van hamme,et al.  An evaluation of unsupervised acoustic model training for a dysarthric speech interface , 2014, INTERSPEECH.

[12]  Ben Taskar,et al.  Relational Markov Networks , 2007 .

[13]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[14]  Mari Ostendorf,et al.  Classification by Augmenting the Bag-of-Words Representation with Redundancy-Compensated Bigrams ∗ , 2005 .

[15]  Philipp Cimiano,et al.  Learning a semantic parser from spoken utterances , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[16]  Zheng Liu,et al.  Comparative experiments on task classification for spoken language understanding using Naive Bayes classifier , 2003, International Conference on Natural Language Processing and Knowledge Engineering, 2003. Proceedings. 2003.

[17]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[18]  C. Middag Automatic analysis of pathological speech , 2012 .

[19]  Andrew McCallum,et al.  An Introduction to Conditional Random Fields , 2010, Found. Trends Mach. Learn..

[20]  Alex Acero,et al.  Discriminative models for spoken language understanding , 2006, INTERSPEECH.

[21]  Andrew McCallum,et al.  Introduction to Statistical Relational Learning , 2007 .

[22]  Milica Gasic,et al.  Spoken language understanding from unaligned data using discriminative classification models , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[23]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[24]  H. A. Leeper,et al.  Dysarthric speech: a comparison of computerized speech recognition and listener intelligibility. , 1997, Journal of rehabilitation research and development.

[25]  Walter Daelemans,et al.  A Self Learning Vocal Interface for Speech-impaired Users , 2013, SLPAT.

[26]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[27]  Ilya Narsky,et al.  Statistical Analysis Techniques in Particle Physics , 2013 .

[28]  Thomas Lukasiewicz MAXIMUM ENTROPY , 2000 .

[29]  Sadaoki Furui,et al.  Confidence scoring for ANN-based spoken language understanding , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[30]  Frank Rudzicz,et al.  Comparing Humans and Automatic Speech Recognition Systems in Recognizing Dysarthric Speech , 2011, Canadian Conference on AI.

[31]  Ye-Yi Wang,et al.  Strategies for statistical spoken language understanding with small amount of data - an empirical study , 2010, INTERSPEECH.

[32]  Gokhan Tur,et al.  Spoken Language Understanding: Systems for Extracting Semantic Information from Speech , 2011 .

[33]  Jort F. Gemmeke The self-taught vocal interface , 2014, HSCMA.

[34]  Alessandro Lenci,et al.  Distributional semantics in linguistic and cognitive research , 2008 .

[35]  Robert Sabourin,et al.  “One Against One” or “One Against All”: Which One is Better for Handwriting Recognition with SVMs? , 2006 .

[36]  Oren Etzioni,et al.  Exploring Markov Logic Networks for Question Answering , 2015, EMNLP.

[37]  David Schlangen,et al.  Situated incremental natural language understanding using Markov Logic Networks , 2014, Comput. Speech Lang..

[38]  Hanna M. Wallach,et al.  Conditional Random Fields: An Introduction , 2004 .

[39]  László Dezsö,et al.  Universal Grammar , 1981, Certainty in Action.

[40]  Adam L. Berger,et al.  A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[41]  Hugo Van hamme,et al.  NMF-based keyword learning from scarce data , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[42]  Jerod J. Weinman,et al.  Large-Scale Machine Learning , 2011 .

[43]  Bart Vanrumste,et al.  Self-taught assistive vocal interfaces: an overview of the ALADIN project , 2013, INTERSPEECH.

[44]  Stephen Pulman Compositional distributional semantics with compact closed categories and Frobenius algebras , 2014 .

[45]  Matthew Richardson,et al.  The Alchemy System for Statistical Relational AI: User Manual , 2007 .

[46]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[47]  Alex Acero,et al.  Semantic Frame‐Based Spoken Language Understanding , 2011 .

[48]  Alessandro Moschitti,et al.  Shallow Semantic Parsing for Spoken Language Understanding , 2009, NAACL.

[49]  Heidi Christensen,et al.  Learning speaker-specific pronunciations of disordered speech , 2013, INTERSPEECH.

[50]  Dilek Z. Hakkani-Tür,et al.  Spoken language understanding , 2008, IEEE Signal Processing Magazine.

[51]  Pedro M. Domingos,et al.  Sound and Efficient Inference with Probabilistic and Deterministic Dependencies , 2006, AAAI.

[52]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[53]  Jati K. Sengupta,et al.  Introduction to Information , 1993 .

[54]  Andrew McCallum,et al.  Using Maximum Entropy for Text Classification , 1999 .