Speech Emotion Pattern Recognition Agent in Mobile Communication Environment Using Fuzzy-SVM

In this paper, we propose a speech emotion recognition agent in mobile communication environment. The agent can recognize five emotional states - neutral, happiness, sadness, anger, and annoyance from the speech captured by a cellular-phone in real time. In general, the speech through the mobile network contains both speaker environmental noise and network noise, thus it can causes serious performance degradation due to the distortion in emotional features of the query speech. In order to minimize the effect of these noises and so improve the system performance, we adopt a simple MA (Moving Average) filter which has relatively simple structure and low computational complexity. Then a SFS (Sequential Forward Selection) feature optimization method is implemented to further improve and stabilize the system performance. For a practical application to call center problem, we created another emotional engine that distinguish two emotional states - ”agitation” which includes anger, happiness and annoyance, and ”calm” which includes neutral and sadness state. Two pattern classification methods, k-NN and Fuzzy-SVM, is compared for emotional state classifications. The experimental results indicate that the proposed method provides very stable and successful emotional classification performance as 72.5% over five emotional states and 86.5% over two emotional states.

[1]  Mingchun Liu,et al.  A study on content-based classification and retrieval of audio database , 2001, Proceedings 2001 International Database Engineering and Applications Symposium.

[2]  M. Ross,et al.  Average magnitude difference function pitch extractor , 1974 .

[3]  Tsuyoshi Moriyama,et al.  Emotion recognition and synthesis system on speech , 1999, Proceedings IEEE International Conference on Multimedia Computing and Systems.

[4]  Shrikanth S. Narayanan,et al.  Classifying emotions in human-machine spoken dialogs , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.

[5]  Shuichi Fukuda,et al.  Emotion in user interface, voice interaction system , 2000, Smc 2000 conference proceedings. 2000 ieee international conference on systems, man and cybernetics. 'cybernetics evolving to systems, humans, organizations, and their complex interactions' (cat. no.0.

[6]  Klaus R. Scherer,et al.  Adding the affective dimension: a new look in speech analysis and synthesis , 1996, ICSLP.

[7]  Shigeo Abe,et al.  Fuzzy support vector machines for multiclass problems , 2002, ESANN.

[8]  Xuejing Sun A pitch determination algorithm based on subharmonic-to-harmonic ratio , 2000, INTERSPEECH.

[9]  Steven J. Simske,et al.  Recognition of emotions in interactive voice response systems , 2003, INTERSPEECH.

[10]  Anil K. Jain,et al.  Statistical Pattern Recognition: A Review , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Frank Dellaert,et al.  Recognizing emotion in speech , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[12]  Lingyun Gu,et al.  A new robust algorithm for isolated word endpoint detection , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.