RECOGNITION OF EMOTIONAL SPEECH AND SPEECH EMOTION IN FARSI

Speech emotion can add extra information to speech in comparison with available textual information. However, it can also lead to some problems in the automatic speech recognition process. We evaluated the changes in speech parameters, i.e. formant frequencies and pitch frequency, due to anger and grief for Farsi language in a former research. Here, using those results, we try to improve emotional speech recognition accuracy using baseline models. We show that adding parameters such as formant and pitch frequencies to the speech feature vector can improve recognition accuracy. The percentage of improvement depends on parameter type, number of mixture components and the emotional condition. Identification of the emotional condition can also help in improving speech recognition accuracy. To recognize emotional condition of speech, formant and pitch frequencies were used successfully in two approaches, namely maximum likelihood and GMM.

[1]  S S Mccandless AN ALGORITHM FOR FORMANT EXTRACTION USING LINEAR PREDICTION SPECTRA , 1974 .

[2]  Roddy Cowie,et al.  Automatic recognition of emotion from voice: a rough benchmark , 2000 .

[3]  S. McCandless,et al.  An algorithm for automatic formant extraction using linear prediction spectra , 1974 .

[4]  Massimo Poesio,et al.  Using high level dialogue information for dialogue act recognition using prosodic features. , 1999 .

[5]  Ralf Kompe,et al.  Emotional space improves emotion recognition , 2002, INTERSPEECH.

[6]  C. H. Chen,et al.  Handbook of Pattern Recognition and Computer Vision , 1993 .

[7]  Tsang-Long Pao,et al.  Detecting Emotions in Mandarin Speech , 2004, ROCLING/IJCLCLP.

[8]  Kate Hunicke-Smith,et al.  Effect of Speaking Style on LVCSR Performance , 1996 .

[9]  Davood Gharavian,et al.  THE EFFECT OF EMOTION ON FARSI SPEECH PARAMETERS: A STATISTICAL EVALUATION , 2005 .

[10]  M Bijankhan,et al.  FARSDAT- THE SPEECH DATABASE OF FARSI SPOKEN LANGUAGE , 1994 .

[11]  Eyal Yair,et al.  Super resolution pitch determination of speech signals , 1991, IEEE Trans. Signal Process..

[12]  Jiucang Hao,et al.  Emotion recognition by speech signals , 2003, INTERSPEECH.

[13]  Jiahong Yuan,et al.  The acoustic realization of anger, fear, joy and sadness in Chinese , 2002, INTERSPEECH.

[14]  Simon King,et al.  Using intonation to constrain language models in speech recognition , 1997, EUROSPEECH.

[15]  Oh-Wook Kwon,et al.  EMOTION RECOGNITION BY SPEECH SIGNAL , 2003 .

[16]  Albino Nogueiras,et al.  Speech emotion recognition using hidden Markov models , 2001, INTERSPEECH.