Emotion Speech Recognition Based on Adaptive Fractional Deep Belief Network and Reinforcement Learning

The identification of emotion is a challenging task due to the rapid development of human–computer interaction framework. Speech Emotion Recognition (SER) can be characterized as the extraction of the emotional condition of the narrator from their spoken utterances. The detection of emotion is troublesome to the computer since it differs according to the speaker. To solve this setback, the system is implemented based on Adaptive Fractional Deep Belief Network (AFDBN) and Reinforcement Learning (RL). Pitch chroma, spectral flux, tonal power ratio and MFCC features are extracted from the speech signal to achieve the desired task. The extracted feature is then given into the classification task. Finally, the performance is analyzed by the evaluation metrics which is compared with the existing systems.

[1]  Giovanni Costantini,et al.  Speech emotion recognition using amplitude modulation parameters and a combined feature selection procedure , 2014, Knowl. Based Syst..

[2]  Sid-Ahmed Selouani,et al.  Automatic emotion recognition using auditory and prosodic indicative features , 2015, 2015 IEEE 28th Canadian Conference on Electrical and Computer Engineering (CCECE).

[3]  Chengxiang Lu,et al.  Using Emotions as Intrinsic Motivation to Accelerate Classic Reinforcement Learning , 2016, 2016 International Conference on Information System and Artificial Intelligence (ISAI).

[4]  Javier Hernando,et al.  Deep Learning Backend for Single and Multisession i-Vector Speaker Recognition , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[5]  Mohamed Kamal Omar,et al.  A factor analysis model of sequences for language recognition , 2016, 2016 IEEE Spoken Language Technology Workshop (SLT).

[6]  Yoon Keun Kwak,et al.  Improved Emotion Recognition With a Novel Speaker-Independent Feature , 2009, IEEE/ASME Transactions on Mechatronics.

[7]  Yongzhao Zhan,et al.  Learning Salient Features for Speech Emotion Recognition Using Convolutional Neural Networks , 2014, IEEE Transactions on Multimedia.

[8]  Atiqur Rahman,et al.  Neighbour cell list optimization based on cooperative q-learning and reinforced back-propagation technique , 2015 .

[9]  Ning An,et al.  Speech Emotion Recognition Using Fourier Parameters , 2015, IEEE Transactions on Affective Computing.

[10]  Songhua Xu,et al.  A Neural Network-Based Approach to Modeling the Allocation of Behaviors in Concurrent Schedule, Variable Interval Learning , 2008, 2008 Fourth International Conference on Natural Computation.

[11]  Byoung-Jun Park,et al.  Emotion classification based on bio-signals emotion recognition using machine learning algorithms , 2014, 2014 International Conference on Information Science, Electronics and Electrical Engineering.

[12]  Geoffroy Peeters Chroma-based estimation of musical key from audio-signal analysis , 2006, ISMIR.

[13]  Alexander Lerch An introduction to audio content analysis , 2012 .

[14]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[15]  Stefan Wermter,et al.  Interactive reinforcement learning through speech guidance in a domestic scenario , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).