Emotion recognition and conversion based on segmentation of speech in Hindi language

In this day and age the very much known form of communication with computers is the use of natural language in text form. To add naturalness and lucidity, the speech form is becoming an important method to communicate with computers and other machines. Human-machine and human-robot dialogues in the future age band will be dominated by natural speech, which is fully impetuous and thus gripped by emotion. The actual user emotion may help system track the user's behaviour by adapting to his inner mental state. Generally recognition of emotions is in the scope of research in the human-machine-interaction. Emotion adds expressiveness to the natural language speech. There is a great zeal to research in this field. In this paper we have proposed a method for word based emotion conversion of neutral speech into emotional speech like `happy' and `sad' for Hindi language. This emotion conversion is based on the segmentation of the speech into words and the pitch differences and duration of these words between different emotions. The segmentation of words in the spoken utterance is done using main prosodic features `pitch' and `intensity', `zero crossing rate' and duration.

[1]  Steve J. Young,et al.  A system for transforming the emotion in speech: combining data-driven conversion techniques for prosody and voice quality , 2007, INTERSPEECH.

[2]  Jean-Claude Junqua,et al.  A robust algorithm for word boundary detection in the presence of noise , 1994, IEEE Trans. Speech Audio Process..

[3]  Bayya Yegnanarayana,et al.  Word boundary hypothesization for continuous speech in Hindi based on F0 patterns , 1996, Speech Commun..

[4]  Dae-Jong Lee,et al.  Emotion recognition from the facial image and speech signal , 2003, SICE 2003 Annual Conference (IEEE Cat. No.03TH8734).

[5]  Nupur Prakash,et al.  Word boundary detection in continuous speech based on suprasegmental features for hindi language , 2010, 2010 2nd International Conference on Signal Processing Systems.

[6]  Aijun Li,et al.  Prosody conversion from neutral speech to emotional speech , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  Marc Schröder,et al.  Emotional speech synthesis: a review , 2001, INTERSPEECH.

[8]  Pauline Welby,et al.  The role of early fundamental frequency rises and elbows in French word segmentation , 2007, Speech Commun..

[9]  Keikichi Hirose,et al.  Prosodic word boundary detection using statistical modeling of moraic fundamental frequency contours and its use for continuous speech recognition , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[10]  K. Scherer,et al.  Acoustic profiles in vocal emotion expression. , 1996, Journal of personality and social psychology.

[11]  Ailbhe Ní Chasaide,et al.  The role of voice quality in communicating emotion, mood and attitude , 2003, Speech Commun..

[12]  Tomoki Toda,et al.  GMM-based voice conversion applied to emotional speech synthesis , 2003, INTERSPEECH.