Emotion analysis from speech using temporal contextual trajectories

We propose a framework for speech emotion analysis that maps sequential acoustic features into descriptors that integrate the temporal ordering of the original features and the discrimination between emotions. Using the topology preserving property of Self Organizing Maps and the continuous nature of speech, the proposed framework maps a speech utterance into a temporally ordered trajectory that reflects the emotion activity over time in the underlying utterance. Using a standard emotional database, we show that the proposed framework is efficient for analysis, visualization and classification of emotions.

[1]  Shashidhar G. Koolagudi,et al.  Emotion recognition from speech: a review , 2012, International Journal of Speech Technology.

[2]  Kornel Laskowski,et al.  Emotion recognition in spontaneous speech using GMMs , 2006, INTERSPEECH.

[3]  Shen Huang,et al.  Music Genre Classification Based on Multiple Classifier Fusion , 2008, 2008 Fourth International Conference on Natural Computation.

[4]  E. Ambikairajah,et al.  Speaker Normalisation for Speech-Based Emotion Detection , 2007, 2007 15th International Conference on Digital Signal Processing.

[5]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[6]  R. Lippmann,et al.  An introduction to computing with neural nets , 1987, IEEE ASSP Magazine.

[7]  Jon Sánchez,et al.  Automatic emotion recognition using prosodic parameters , 2005, INTERSPEECH.

[8]  Jari Kangas,et al.  Time-delayed self-organizing maps , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[9]  Teuvo Kohonen,et al.  Self-organization and associative memory: 3rd edition , 1989 .

[10]  Jari Kangas Phoneme recognition using time-dependent versions of self-organizing maps , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[11]  Marc Schröder,et al.  Emotional speech synthesis: a review , 2001, INTERSPEECH.

[12]  Michael Biehl,et al.  Dynamics and Generalization Ability of LVQ Algorithms , 2007, J. Mach. Learn. Res..

[13]  George Tzanetakis,et al.  Musical genre classification of audio signals , 2002, IEEE Trans. Speech Audio Process..

[14]  B. Atal Automatic Speaker Recognition Based on Pitch Contours , 1969 .

[15]  Dimitrios Ververidis,et al.  A State of the Art Review on Emotional Speech Databases , 2003 .

[16]  S. Ramakrishnan Recognition of Emotion from Speech: A Review , 2012 .

[17]  Adel Said Elmaghraby,et al.  Speech emotion detection using time dependent self organizing maps , 2013, IEEE International Symposium on Signal Processing and Information Technology.

[18]  Yi-Ping Phoebe Chen,et al.  Acoustic feature selection for automatic emotion recognition from speech , 2009, Inf. Process. Manag..

[19]  Yongzhao Zhan,et al.  Adaptive and Optimal Classification of Speech Emotion Recognition , 2008, 2008 Fourth International Conference on Natural Computation.

[20]  Leonidas J. Guibas,et al.  The Earth Mover's Distance as a Metric for Image Retrieval , 2000, International Journal of Computer Vision.

[21]  Kari Torkkola,et al.  Using the topology-preserving properties of SOFMs in speech recognition , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[22]  Dipti D. Joshi,et al.  Speech Emotion Recognition: A Review , 2013 .

[23]  Derek C. Rose,et al.  Deep Machine Learning - A New Frontier in Artificial Intelligence Research [Research Frontier] , 2010, IEEE Computational Intelligence Magazine.

[24]  Fei Huang,et al.  Brief Introduction of Back Propagation (BP) Neural Network Algorithm and Its Improvement , 2012 .