Investigating the Recognition of Non-articulatory Sounds by Using Statistical Tests and Support Vector Machine

People with articulation and phonological disorders need training to plan and to execute sounds of speech. Compared to other children, children with Down Syndrome have significantly delayed speech development because they present developmental disabilities, mainly apraxia of speech. In practice, speech therapists plan and perform trainings of articulatory and non-articulatory sounds such as blow production and popping lips in order to assist speech production. Mobile applications can be integrated into the clinical treatment to transcend the boundaries of clinics and schedules and therefore reach more people at any time. The use of artificial intelligence and machine learning techniques can improve this kind of application. The aim of this pilot study is to assess speech recognition methods prioritizing the training of sounds for speech production, particularly the non-articulatory sounds. These methods apply Mel-Frequency Cepstrum Coefficients and Laplace transform to extract features, as well as traditional statistical tests and Support Vector Machine (SVM) to recognize sounds. This study also reports experimental results regarding the effectiveness of the methods on a set of 197 sounds. Overall, SVM provides higher accuracy.

[1]  Kuldip K. Paliwal,et al.  Perceptually motivated linear prediction cepstral features for network speech recognition , 2016, 2016 10th International Conference on Signal Processing and Communication Systems (ICSPCS).

[2]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[3]  Raymond D. Kent,et al.  Nonspeech Oral Movements and Oral Motor Disorders: A Narrative Review. , 2015, American journal of speech-language pathology.

[4]  Dong Yu,et al.  Automatic Speech Recognition: A Deep Learning Approach , 2014 .

[5]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[6]  B. Dodd,et al.  Speech disorder in children with Down's syndrome. , 2001, Journal of intellectual disability research : JIDR.

[7]  Amélie Rochet-Capellan,et al.  Auditory-Visual Perception of VCVs Produced by People with Down Syndrome: Preliminary Results , 2016, INTERSPEECH.

[8]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[9]  Eko Supriyanto,et al.  Language assessment and training support system (LATSS) for down syndrome children under 6 years old , 2010 .

[10]  Vennila Ramalingam,et al.  Unsupervised speaker segmentation with residual phase and MFCC features , 2009, Expert Syst. Appl..

[11]  Olle Bälter,et al.  Wizard-of-Oz test of ARTUR: a computer-based speech training system with articulation correction , 2005, Assets '05.

[12]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[13]  Rodolfo Ostos,et al.  A pilot study of the use of emerging computer technologies to improve the effectiveness of reading and writing therapies in children with Down syndrome , 2017, Br. J. Educ. Technol..

[14]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[15]  David Kremelberg,et al.  Practical Statistics: A Quick and Easy Guide to IBM® SPSS® Statistics, STATA, and Other Statistical Software , 2010 .