Speech Emotion Detection using IoT based Deep Learning for Health Care

Human emotions are essential to recognize the behavior and state of mind of a person. Emotion detection through speech signals has started to receive more attention lately. This paper proposes the method for detecting human emotions using speech signals and its implementation in real-time using the Internet of Things (IoT) based deep learning for the care of older adults in nursing homes. The research has two main contributions. First, we have implemented a real-time system based on audio IoT, where we have recorded human voice and predicted emotions via deep learning. Secondly, for advance classification, we have designed a model using data normalization and data augmentation techniques. Finally, we have created an integrated deep learning model, called Speech Emotion Detection (SED), using a 2D convolutional neural networks (CNN). The best accuracy that was reported by our method was approximately 95%, which outperformed all state-of-the-art approaches. We have further extended to apply the SED model to a live audio sentiment analysis system with IoT technologies for the care of older adults in nursing homes.

[1]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[2]  Souvik Mallik,et al.  Development and performance analysis of a low-cost MEMS microphone-based hearing aid with three different audio amplifiers , 2019, Innovations in Systems and Software Engineering.

[3]  S. R. Livingstone,et al.  The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English , 2018, PloS one.

[4]  Justin Salamon,et al.  Deep Convolutional Neural Networks and Data Augmentation for Environmental Sound Classification , 2016, IEEE Signal Processing Letters.

[5]  Bhaskar Krishnamachari,et al.  Exploiting IoT technologies for enhancing Health Smart Homes through patient identification and emotion recognition , 2016, Comput. Commun..

[6]  Yugyung Lee,et al.  Smart 311 Request System with Automatic Noise Detection for Safe Neighborhood , 2018, 2018 IEEE International Smart Cities Conference (ISC2).

[7]  Huaimin Wang,et al.  Sample Mixed-Based Data Augmentation for Domestic Audio Tagging , 2018, DCASE.

[8]  José Manuel Pastor,et al.  Software Architecture for Smart Emotion Recognition and Regulation of the Ageing Adult , 2016, Cognitive Computation.

[9]  Hatice Gunes,et al.  Bi-modal emotion recognition from expressive face and body gestures , 2007, J. Netw. Comput. Appl..

[10]  Nithya Davis,et al.  Environmental Sound Classification Using Deep Convolutional Neural Networks and Data Augmentation , 2018, 2018 IEEE Recent Advances in Intelligent Computational Systems (RAICS).

[11]  Walid Mahdi,et al.  Improving speech recognition using data augmentation and acoustic model fusion , 2017, KES.

[12]  王海龙,et al.  Raspberry Pi Model B , 2012 .

[13]  Colin Raffel,et al.  librosa: Audio and Music Signal Analysis in Python , 2015, SciPy.

[14]  Shaun J. Canavan,et al.  Ubiquitous Emotion Recognition Using Audio and Video Data , 2018, UbiComp/ISWC Adjunct.

[15]  Gerald Penn,et al.  Convolutional Neural Networks for Speech Recognition , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[16]  L. Rafael Aguiar,et al.  Exploring Data Augmentation to Improve Music Genre Classification with ConvNets , 2018, 2018 International Joint Conference on Neural Networks (IJCNN).

[17]  Fabio Paternò,et al.  Speaker-independent emotion recognition exploiting a psychologically-inspired binary cascade classification schema , 2012, International Journal of Speech Technology.

[18]  Wootaek Lim,et al.  Speech emotion recognition using convolutional and Recurrent Neural Networks , 2016, 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA).

[19]  Wan Khairunizam,et al.  Implementation of wavelet packet transform and non linear analysis for emotion classification in stroke patient using brain signals , 2017, Biomed. Signal Process. Control..

[20]  Yugyung Lee,et al.  Audio IoT Analytics for Home Automation Safety , 2018, 2018 IEEE International Conference on Big Data (Big Data).