Cross-Lingual Vocal Emotion Recognition in Five Native Languages of Assam Using Eigenvalue Decomposition

This work investigates whether vocal emotion expressions of full-blown discrete emotions can be recognized cross-lingually. This study will enable us to get more information regarding nature and function of emotion. Furthermore, this work will help in developing a generalized vocal emotion recognition system, which will increase the efficiency required for human-machine interaction systems. An emotional speech database was created with 140 simulated utterances (20 per emotion) per speaker, consisting of short sentences of six full-blown discrete basic emotions and one 'no-emotion' (i.e. neutral) in five native languages (not dialects) of Assam. A new feature set is proposed based on Eigenvalues of Autocorrelation Matrix (EVAM) of each frame of utterance. The Gaussian Mixture Model is used as classifier. The performance of EVAM feature set is compared at two sampling frequencies (44.1 kHz and 8.1 kHz) and with additive white noise with signal-to-noise ratios of 0 db, 5 db, 10 db and 20 db.