Deep Learning Bidirectional LSTM based Detection of Prolongation and Repetition in Stuttered Speech using Weighted MFCC

Stuttering is a neuro-development disorder during which normal speech flow is not fluent. Traditionally Speech-Language Pathologists used to assess the extent of stuttering by counting the speech disfluencies manually. Such sorts of stuttering assessments are arbitrary, incoherent, lengthy, and error-prone. The present study focused on objective assessment to speech disfluencies such as prolongation and syllable, word, and phrase repetition. The proposed method is based on the Weighted Mel Frequency Cepstral Coefficient feature extraction algorithm and deep-learning Bidirectional Long-Short term Memory neural network for classification of stuttered events. The work has utilized the UCLASS stuttering dataset for analysis. The speech samples of the database are initially pre-processed, manually segmented, and labeled as a type of disfluency. The labeled speech samples are parameterized to Weighted MFCC feature vectors. Then extracted features are inputted to the Bidirectional-LSTM network for training and testing of the model. The effect of different hyper-parameters on classification results is examined. The test results show that the proposed method reaches the best accuracy of 96.67%, as compared to the LSTM model. The promising recognition accuracy of 97.33%, 98.67%, 97.5%, 97.19%, and 97.67% was achieved for the detection of fluent, prolongation, syllable, word, and phrase repetition, respectively.

[1]  J. Pálfy,et al.  Analysis of Dysfluencies by Computational Intelligence , 2014 .

[2]  Yinhai Wang,et al.  Stacked Bidirectional and Unidirectional LSTM Recurrent Neural Network for Forecasting Network-wide Traffic State with Missing Values , 2020, Transportation Research Part C: Emerging Technologies.

[3]  H. C. Nagaraj,et al.  An Approach for Objective Assessment of Stuttered Speech Using MFCC Features , 2009 .

[4]  Guoqiang Zhong,et al.  From shallow feature learning to deep learning: Benefits from the width and depth of deep architectures , 2019, WIREs Data Mining Knowl. Discov..

[5]  P. Mahesha,et al.  Automatic Segmentation and Classification of Dysfluencies in Stuttering Speech , 2016, ICTCS '16.

[6]  Shane Erickson,et al.  The social and communication impact of stuttering on adolescents and their families. , 2013, Journal of fluency disorders.

[7]  Ravi Kumar,et al.  Comparison of Multidimensional MFCC Feature Vectors for Objective Assessment of Stuttered Disfluencies , 2011 .

[8]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[9]  Razvan Pascanu,et al.  On the difficulty of training recurrent neural networks , 2012, ICML.

[10]  Sazali Yaacob,et al.  Comparison of speech parameterization techniques for the classification of speech disfluencies , 2013 .

[11]  Shashidhar G. Koolagudi,et al.  Repetition Detection in Stuttered Speech , 2016 .

[12]  Akbar Siami Namin,et al.  The Performance of LSTM and BiLSTM in Forecasting Time Series , 2019, 2019 IEEE International Conference on Big Data (Big Data).

[13]  M.G. Bellanger,et al.  Digital processing of speech signals , 1980, Proceedings of the IEEE.

[14]  M. Hariharan,et al.  MFCC based recognition of repetitions and prolongations in stuttered speech using k-NN and LDA , 2009, 2009 IEEE Student Conference on Research and Development (SCOReD).

[15]  M. Hariharan,et al.  Speech stuttering assessment using sample entropy and Least Square Support Vector Machine , 2012, 2012 IEEE 8th International Colloquium on Signal Processing and its Applications.

[16]  Shashidhar G. Koolagudi,et al.  Recognition of Repetition and Prolongation in Stuttered Speech Using ANN , 2016 .

[17]  Zenglin Xu,et al.  Robust Softmax Regression for Multi-class Classification with Self-Paced Learning , 2017, IJCAI.

[18]  Alex Acero,et al.  Spoken Language Processing: A Guide to Theory, Algorithm and System Development , 2001 .

[19]  Mark Johnson,et al.  Disfluency Detection using Auto-Correlational Neural Networks , 2018, EMNLP.

[20]  Wieslawa Kuniszyk-Józkowiak,et al.  Artificial Neural Networks in the Disabled Speech Analysis , 2009, Computer Recognition Systems 3.

[21]  Gresha Bhatia,et al.  Stutter Diagnosis and Therapy System Based on Deep Learning , 2020, ArXiv.

[22]  Thiang,et al.  SPEECH RECOGNITION USING LPC AND HMM APPLIED FOR CONTROLLING MOVEMENT OF MOBILE ROBOT , 2011 .

[23]  Santosh Chapaneri,et al.  Spoken Digits Recognition using Weighted MFCC and Improved Features for Dynamic Time Warping , 2012 .

[24]  Chee-Ming Ting,et al.  Application of Malay speech technology in Malay Speech Therapy Assistance Tools , 2007, 2007 International Conference on Intelligent and Advanced Systems.

[25]  Cai Yu,et al.  Voice activity detection based on short-time energy and noise spectrum adaptation , 2002, 6th International Conference on Signal Processing, 2002..

[26]  Wieslawa Kuniszyk-Józkowiak,et al.  Hierarchical ANN system for stuttering identification , 2013, Comput. Speech Lang..

[27]  Salma Jabeen,et al.  Analysis of 0dB and 10dB babble noise on stuttered speech , 2015, 2015 International Conference on Soft-Computing and Networks Security (ICSNS).

[28]  Nader Jafarnia Dabanloo,et al.  Automatic classification of speech dysfluencies in continuous speech based on similarity measures and morphological image processing tools , 2016, Biomed. Signal Process. Control..

[29]  Yoshua Bengio,et al.  Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies , 2001 .

[30]  Chee Peng Lim,et al.  Speech recognition using artificial neural networks , 2000, Proceedings of the First International Conference on Web Information Systems Engineering.

[31]  Buket D. Barkana,et al.  Voiced/Unvoiced Decision for Speech Signals Based on Zero-Crossing Rate and Energy , 2008, SCSS.

[32]  Sazali Yaacob,et al.  Classification of speech dysfluencies with MFCC and LPCC features , 2012, Expert Syst. Appl..

[33]  Peter Howell,et al.  The UCLASS archive of stuttered speech , 2009 .

[34]  Chandralika Chakraborty,et al.  Issues and Limitations of HMM in Speech Processing: A Survey , 2016 .

[35]  Michael Frueh,et al.  Stuttering An Integrated Approach To Its Nature And Treatment , 2016 .

[36]  Navdeep Jaitly,et al.  Hybrid speech recognition with Deep Bidirectional LSTM , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[37]  Lawrence K. Saul,et al.  Kernel Methods for Deep Learning , 2009, NIPS.

[38]  Peter Howell,et al.  Facilities to assist people to research into stammered speech. , 2004, Stammering research : an on-line journal published by the British Stammering Association.

[39]  J. Schmidhuber,et al.  Framewise phoneme classification with bidirectional LSTM networks , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[40]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[41]  M. Hariharan,et al.  Automatic detection of prolongations and repetitions using LPCC , 2009, 2009 International Conference for Technical Postgraduates (TECHPOS).

[42]  Khaled Shaalan,et al.  Speech Recognition Using Deep Neural Networks: A Systematic Review , 2019, IEEE Access.

[43]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[44]  Sepp Hochreiter,et al.  The Vanishing Gradient Problem During Learning Recurrent Neural Nets and Problem Solutions , 1998, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[45]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[46]  W. Johnson Measurements of oral reading and speaking rate and disfluency of adult male and female stutterers and nonstutterers. , 1961, The Journal of speech and hearing disorders.