Sibilant consonants classification comparison with multi‐ and single‐class neural networks

Many children with speech sound disorders cannot pronounce the sibilant consonants correctly. We have developed a serious game, which is controlled by the children's voices in real time, with the purpose of helping children on practicing the production of European Portuguese (EP) sibilant consonants. For this, the game uses a sibilant consonant classifier. Since the game does not require any type of adult supervision, children can practice producing these sounds more often, which may lead to faster improvements of their speech. Recently, the use of deep neural networks has given considerable improvements in the classification of a variety of use cases, from image classification to speech and language processing. Here, we propose to use deep convolutional neural networks to classify sibilant phonemes of EP in our serious game for speech and language therapy. We compared the performance of several different artificial neural networks that used Mel frequency cepstral coefficients or log Mel filterbanks. Our best deep learning model achieves classification scores of 95.48% using a 2D convolutional model with log Mel filterbanks as input features. Such results are then further improved for specific classes with simple binary classifiers.

[1]  Tara N. Sainath,et al.  Improvements to filterbank and delta learning within a deep neural network framework , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Tara N. Sainath,et al.  Learning the speech front-end with raw waveform CLDNNs , 2015, INTERSPEECH.

[3]  Yu Zhang,et al.  Very deep convolutional networks for end-to-end speech recognition , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4]  R. Teasell,et al.  Intensity of Aphasia Therapy, Impact on Recovery , 2003, Stroke.

[5]  Gianfranco Denes,et al.  Intensive versus regular speech therapy in global aphasia: A controlled study , 1996 .

[6]  Chong Wang,et al.  Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin , 2015, ICML.

[7]  Simon King,et al.  Framewise phone classification using support vector machines , 2002, INTERSPEECH.

[8]  Taghi M. Khoshgoftaar,et al.  Survey on deep learning with class imbalance , 2019, J. Big Data.

[9]  Daniel P. W. Ellis,et al.  Speech and Audio Signal Processing - Processing and Perception of Speech and Music, Second Edition , 1999 .

[10]  M. Guerti,et al.  Arabic Speech Pathology Therapy Computer Aided System , 2007 .

[11]  Tara N. Sainath,et al.  Making Deep Belief Networks effective for large vocabulary continuous speech recognition , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.

[12]  Ivo Anjos,et al.  A Serious Mobile Game with Visual Feedback for Training Sibilant Consonants , 2017, ACE.

[13]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[14]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[15]  Lin-Shan Lee,et al.  Higher Order Cepstral Moment Normalization for Improved Robust Speech Recognition , 2009, IEEE Trans. Speech Audio Process..

[16]  Xiaojun Wan,et al.  Attention-based LSTM Network for Cross-Lingual Sentiment Classification , 2016, EMNLP.

[17]  Mary Louise Edwards,et al.  Phonological awareness and types of sound errors in preschoolers with speech sound disorders. , 2010, Journal of speech, language, and hearing research : JSLHR.

[18]  Zuzanna Miodonska,et al.  Computer-Aided Evaluation of Sibilants in Preschool Children Sigmatism Diagnosis , 2016, ITIB.

[19]  Carmen Peláez-Moreno,et al.  SVMs for Automatic Speech Recognition: A Survey , 2005, WNSP.

[20]  Pedro J. Moreno,et al.  On the use of support vector machines for phonetic classification , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[21]  P. Littlejohns,et al.  Trial of intensive compared with weekly speech therapy in preschool children. , 1992, Archives of disease in childhood.