Emotion recognition using speech and neural structured learning to facilitate edge intelligence

Abstract Emotions are quite important in our daily communications and recent years have witnessed a lot of research works to develop reliable emotion recognition systems based on various types data sources such as audio and video. Since there is no apparently visual information of human faces, emotion analysis based on only audio data is a very challenging task. In this work, a novel emotion recognition is proposed based on robust features and machine learning from audio speech. For a person independent emotion recognition system, audio data is used as input to the system from which, Mel Frequency Cepstrum Coefficients (MFCC) are calculated as features. The MFCC features are then followed by discriminant analysis to minimize the inner-class scatterings while maximizing the inter-class scatterings. The robust discriminant features are then applied with an efficient and fast deep learning approach Neural Structured Learning (NSL) for emotion training and recognition. The proposed approach of combining MFCC, discriminant analysis and NSL generated superior recognition rates compared to other traditional approaches such as MFCC-DBN, MFCC-CNN, and MFCC-RNN during the experiments on an emotion dataset of audio speeches. The system can be adopted in smart environments such as homes or clinics to provide affective healthcare. Since NSL is fast and easy to implement, it can be tried on edge devices with limited datasets collected from edge sensors. Hence, we can push the decision-making step towards where data resides rather than conventionally processing of data and making decisions from far away of the data sources. The proposed approach can be applied in different practical applications such as understanding peoples’ emotions in their daily life and stress from the voice of the pilots or air traffic controllers in air traffic management systems.

[1]  Ahmad Almogren,et al.  AI-enabled emotion-aware robot: The fusion of smart clothing, edge clouds and robotics , 2020, Future Gener. Comput. Syst..

[2]  Masakiyo Fujimoto Factored Deep Convolutional Neural Networks for Noise Robust Speech Recognition , 2017, INTERSPEECH.

[3]  A. Fleury,et al.  Sound and speech detection and classification in a Health Smart Home , 2008, 2008 30th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[4]  Ruchuan Wang,et al.  A Survey on Automatic Emotion Recognition Using Audio Big Data and Deep Learning Architectures , 2018, 2018 IEEE 4th International Conference on Big Data Security on Cloud (BigDataSecurity), IEEE International Conference on High Performance and Smart Computing, (HPSC) and IEEE International Conference on Intelligent Data and Security (IDS).

[5]  Seyed Kamaledin Setarehdan,et al.  Support vector machine-based arrhythmia classification using reduced features of heart rate variability signal , 2008, Artif. Intell. Medicine.

[6]  Emily Mower Provost,et al.  Emotion recognition from spontaneous speech using Hidden Markov models with deep belief networks , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[7]  Yun Li,et al.  A Microphone Array System for Automatic Fall Detection , 2012, IEEE Transactions on Biomedical Engineering.

[8]  Yun Li,et al.  Improving acoustic fall recognition by adaptive signal windowing , 2011, 2011 Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[9]  Mihail Popescu,et al.  Acoustic fall detection using a circular microphone array , 2010, 2010 Annual International Conference of the IEEE Engineering in Medicine and Biology.

[10]  Marjorie Skubic,et al.  An acoustic fall detector system that uses sound height information to reduce the false alarm rate , 2008, 2008 30th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[11]  Ying Wu,et al.  Robust 3D Action Recognition with Random Occupancy Patterns , 2012, ECCV.

[12]  Andy W. H. Khong,et al.  End-to-end speech emotion recognition using multi-scale convolution networks , 2017, 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC).

[13]  P. Aruna,et al.  Applying Machine Learning Techniques for Speech Emotion Recognition , 2018, 2018 9th International Conference on Computing, Communication and Networking Technologies (ICCCNT).

[14]  Giancarlo Fortino,et al.  Automatic Methods for the Detection of Accelerative Cardiac Defense Response , 2016, IEEE Transactions on Affective Computing.

[15]  Giancarlo Fortino,et al.  An Edge-Based Architecture to Support Efficient Applications for Healthcare Industry 4.0 , 2019, IEEE Transactions on Industrial Informatics.

[16]  Weria Khaksar,et al.  Ambient Sensors for Elderly Care and Independent Living: A Survey , 2018, Sensors.

[17]  Huimin Lu,et al.  PEA: Parallel electrocardiogram-based authentication for smart healthcare systems , 2018, J. Netw. Comput. Appl..

[18]  Mark Hasegawa-Johnson,et al.  Acoustic fall detection using Gaussian mixture models and GMM supervectors , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[19]  Weria Khaksar,et al.  Facial Expression Recognition Using Salient Features and Convolutional Neural Network , 2017, IEEE Access.

[20]  Xiaodong Yang,et al.  Recognizing actions using depth motion maps-based histograms of oriented gradients , 2012, ACM Multimedia.

[21]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[22]  Asaf Varol,et al.  New Trends in Speech Emotion Recognition , 2019, 2019 7th International Symposium on Digital Forensics and Security (ISDFS).

[23]  Claudio Savaglio,et al.  A body sensor data fusion and deep recurrent neural network-based behavior recognition approach for robust healthcare , 2020, Inf. Fusion.

[24]  George Trigeorgis,et al.  Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[25]  Christian Igel,et al.  Training restricted Boltzmann machines: An introduction , 2014, Pattern Recognit..

[26]  Brigitte Meillon,et al.  The sweet-home project: Audio technology in smart homes to improve well-being and reliance , 2011, 2011 Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[27]  Md. Zia Uddin A depth video-based facial expression recognition system utilizing generalized local directional deviation-based binary pattern feature discriminant analysis , 2015, Multimedia Tools and Applications.

[28]  Jon Barker,et al.  Emotion Recognition from the Speech Signal by Effective Combination of Generative and Discriminative Models , 2016 .

[29]  Zicheng Liu,et al.  Expandable Data-Driven Graphical Modeling of Human Actions Based on Salient Postures , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[30]  Astrid Paeschke,et al.  A database of German emotional speech , 2005, INTERSPEECH.

[31]  Giancarlo Fortino,et al.  Human emotion recognition using deep belief network architecture , 2019, Inf. Fusion.

[32]  M. Popescu,et al.  Acoustic fall detection using one-class classifiers , 2009, 2009 Annual International Conference of the IEEE Engineering in Medicine and Biology Society.