Speaker-Characterized Emotion Recognition using Online and Iterative Speaker Adaptation

This paper proposes a novel speech emotion recognition (SER) framework for affective interaction between human and personal devices. Most of the conventional SER techniques adopt a speaker-independent model framework because of the sparseness of individual speech data. However, a large amount of individual data can be accumulated on a personal device, making it possible to construct speaker-characterized emotion models in accordance with a speaker adaptation procedure. In this study, to address problems associated with conventional adaptation approaches in SER tasks, we modified a representative adaptation technique, maximum likelihood linear regression (MLLR), on the basis of selective label refinement. We subsequently carried out the modified MLLR procedure in an online and iterative manner, using accumulated individual data, to further enhance the speaker-characterized emotion models. In the SER experiments based on an emotional corpus, our approach exhibited performance superior to that of conventional adaptation techniques as well as the speaker-independent model framework.

[1]  Jeong-Sik Park,et al.  Feature vector classification based speech emotion recognition for service robots , 2009, IEEE Transactions on Consumer Electronics.

[2]  Biing-Hwang Juang,et al.  A study on speaker adaptation of the parameters of continuous density hidden Markov models , 1991, IEEE Trans. Signal Process..

[3]  Mark A. Neerincx,et al.  Interacting in Desktop and Mobile Context: Emotion, Trust, and Task Performance , 2003, EUSAI.

[4]  Michael Rohs,et al.  The smart phone: a ubiquitous input device , 2006, IEEE Pervasive Computing.

[5]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[6]  Constantine Kotropoulos,et al.  Emotional speech recognition: Resources, features, and methods , 2006, Speech Commun..

[7]  Hermann Ney,et al.  Improved MLLR speaker adaptation using confidence measures for conversational speech recognition , 2000, INTERSPEECH.

[8]  Suprateek Sarker,et al.  Understanding mobile handheld device use and adoption , 2003, CACM.

[9]  Oh-Wook Kwon,et al.  EMOTION RECOGNITION BY SPEECH SIGNAL , 2003 .

[10]  Sadaoki Furui,et al.  N-Best-based unsupervised speaker adaptation for speech recognition , 1998, Comput. Speech Lang..

[11]  Joaquín González-Rodríguez,et al.  Speaker dependent emotion recognition using prosodic supervectors , 2009, INTERSPEECH.

[12]  Changxue Ma,et al.  Toward A Speaker-Independent Real-Time Affect Detection System , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[13]  Shrikanth S. Narayanan,et al.  Primitives-based evaluation and estimation of emotions in speech , 2007, Speech Commun..

[14]  Mark J. F. Gales,et al.  Iterative unsupervised adaptation using maximum likelihood linear regression , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[15]  W. Minker,et al.  Handling Emotions in Human-Computer Dialogues , 2009 .

[16]  Albino Nogueiras,et al.  Speech emotion recognition using hidden Markov models , 2001, INTERSPEECH.

[17]  Jiucang Hao,et al.  Emotion recognition by speech signals , 2003, INTERSPEECH.

[18]  Michiel Bacchiani,et al.  Confidence scores for acoustic model adaptation , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[19]  Ralf Kompe,et al.  Emotional space improves emotion recognition , 2002, INTERSPEECH.

[20]  Hui Jiang,et al.  Confidence measures for speech recognition: A survey , 2005, Speech Commun..

[21]  Tasos Anastasakos,et al.  The use of confidence measures in unsupervised adaptation of speech recognizers , 1998, ICSLP.