A Speech Emotion Recognition Solution-based on Support Vector Machine for Children with Autism Spectrum Disorder to Help Identify Human Emotions

Children who fall into the autism spectrum have difficulty communicating with others. In this work, a speech emotion recognition model has been developed to help children with Autism Spectrum Disorder (ASD) identify emotions in social interactions. The model is created using the Python programming language to develop a machine learning model based on the Support Vector Machine (SVM). SVM has proven to yield high accuracies when classifying inputs in speech processing. Individual audio databases are specifically designed to train models for the emotion recognition task. One such speech corpus is the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS), which is used to train the model in this work. Acoustic feature extraction will be part of the pre-processing step utilizing Python libraries. The libROSA library is used in this work. The first 26 Mel-frequency Cepstral Coefficients (MFCCs) and the zero-crossing rate (ZCR) are extracted and used as the acoustic features to train the machine learning model. The final SVM model provided a test accuracy of 77%. This model also performed well when significant background noise was introduced to the RAVDESS audio recordings, for which it yielded a test accuracy of 64%.

[1]  Yuan Jian,et al.  Application of Speech Emotion Recognition in Intelligent Household Robot , 2010, 2010 International Conference on Artificial Intelligence and Computational Intelligence.

[2]  Feng Rong,et al.  Audio Classification Method Based on Machine Learning , 2016, 2016 International Conference on Intelligent Transportation, Big Data & Smart City (ICITBS).

[3]  Arnaud Martin,et al.  Belief Hidden Markov Model for speech recognition , 2013, 2013 5th International Conference on Modeling, Simulation and Applied Optimization (ICMSAO).

[4]  Hynek Hermansky,et al.  M-vectors: Sub-band Based Energy Modulation Features for Multi-stream Automatic Speech Recognition , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5]  Dan Wu An Audio Classification Approach Based on Machine Learning , 2019, 2019 International Conference on Intelligent Transportation, Big Data & Smart City (ICITBS).

[6]  J. Lerner,et al.  Emotion and decision making. , 2015, Annual review of psychology.

[7]  G. Shanmugasundaram,et al.  A Comprehensive Review on Stress Detection Techniques , 2019, 2019 IEEE International Conference on System, Computation, Automation and Networking (ICSCAN).

[8]  K. von Kriegstein,et al.  The Relation Between Vocal Pitch and Vocal Emotion Recognition Abilities in People with Autism Spectrum Disorder and Typical Development , 2018, Journal of autism and developmental disorders.

[9]  S. R. Livingstone,et al.  The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English , 2018, PloS one.

[10]  Sajib Hasan,et al.  Emotion Detection from Speech Signals using Voting Mechanism on Classified Frames , 2019, 2019 International Conference on Robotics,Electrical and Signal Processing Techniques (ICREST).

[11]  Akputu K. Oryina,et al.  Emotion Recognition for User Centred E-Learning , 2016, 2016 IEEE 40th Annual Computer Software and Applications Conference (COMPSAC).

[12]  Zhang Yi,et al.  Spectrogram based multi-task audio classification , 2017, Multimedia Tools and Applications.

[13]  Abdullah Al Bashit A Comprehensive Solar Powered Remote Monitoring and Identification of Houston Toad Call Automatic Recognizing Device System Design , 2019 .

[14]  George Trigeorgis,et al.  Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]  Vahid Mirjalili,et al.  Python machine learning : machine learning and deep learning with Python, scikit-learn, and TensorFlow , 2017 .

[16]  Thomas Pellegrini,et al.  Densely connected CNNs for bird audio detection , 2017, 2017 25th European Signal Processing Conference (EUSIPCO).

[17]  Vincenzo Lipari,et al.  "Hello? Who Am I Talking to?" A Shallow CNN Approach for Human vs. Bot Speech Classification , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[18]  Jing He,et al.  State-Time-Alignment Phone Clustering Based Language-independent Phone Recognizer Front-end for Phonotactic Language Recognition , 2019, 2019 14th International Conference on Computer Science & Education (ICCSE).

[19]  Laurence Devillers,et al.  Detection of real-life emotions in call centers , 2005, INTERSPEECH.

[20]  P. Ekman,et al.  Constants across cultures in the face and emotion. , 1971, Journal of personality and social psychology.

[21]  Chao Xue,et al.  A Novel English Speech Recognition Approach Based on Hidden Markov Model , 2018, 2018 International Conference on Virtual Reality and Intelligent Systems (ICVRIS).