Improving emotion detection with speech by enhanced approach

Emotion detection currently is found to be an important and interesting part of speech analysis. The analysis can be done by selection of an effective parameter or by combination of a number of parameters to gain higher accuracy level. Definitely selection of a number of parameters together will provide a reliable solution for getting higher level of accuracy than that of for the single parameter. Energy, MFCCs, pitch values, timbre, vocal tract frequencies are found to be effective parameters with which detection accuracy can be improved. The work presented here is by use of database with one of the Indian language Marathi of 1200 speech files recorded by male and female professional actors for six different emotions happy, angry, neutral, sad, fear and surprised. It is noted that results with the language are proportional with results of other languages indicating that language is an independent parameter for emotion detection. Similarly, by addition of an effective classifier like neural network can further yield the recognition accuracy nearly to 100%. The work attempts to interpret the fact that combining the results of each parameter have improved the detection accuracy. In near future, it may also lead to design a system capable to work by understanding human emotions.

[1]  Alex Acero,et al.  Spoken Language Processing , 2001 .

[2]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[3]  Ian McLoughlin,et al.  Applied Speech and Audio Processing: With Matlab Examples , 2009 .

[4]  W. Sendlmeier,et al.  Verification of acoustical correlates of emotional speech using formant-synthesis , 2000 .

[5]  Massimo Giustiniani,et al.  A hidden Markov model approach to speech synthesis , 1989, EUROSPEECH.

[6]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[7]  J. Montero,et al.  ANALYSIS AND MODELLING OF EMOTIONAL SPEECH IN SPANISH , 1999 .

[8]  Erhard Rank,et al.  Generating emotional speech with a concatenative synthesizer , 1998, ICSLP.

[9]  Mike Edgington,et al.  Investigating the limitations of concatenative synthesis , 1997, EUROSPEECH.

[10]  Sjl Mozziconacci Speech variability and emotion : production and perception , 1998 .

[11]  Janet E. Cahn Generating expression in synthesized speech , 1989 .

[12]  John L. Arnott,et al.  Implementation and testing of a system for producing emotion-by-rule in synthetic speech , 1995, Speech Commun..

[13]  Beth Logan,et al.  Mel Frequency Cepstral Coefficients for Music Modeling , 2000, ISMIR.

[14]  M. Schröder CAN EMOTIONS BE SYNTHESIZED WITHOUT CONTROLLING VOICE QUALITY , 1999 .

[15]  Iain R. Murray,et al.  RULE-BASED EMOTION SYNTHESIS USING CONCATENATED SPEECH , 2000 .

[16]  Barbara Heuft,et al.  Emotions in time domain synthesis , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[17]  James H. Martin,et al.  Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd Edition , 2000, Prentice Hall series in artificial intelligence.

[18]  Jean Vroomen,et al.  Duration and intonation in emotional speech , 1993, EUROSPEECH.

[19]  Yongzhao Zhan,et al.  Speech Emotion Recognition Using CNN , 2014, ACM Multimedia.

[20]  Heiga Zen,et al.  Statistical Parametric Speech Synthesis , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[21]  I. Iriondo,et al.  VALIDATION OF AN ACOUSTICAL MODELLING OF EMOTIONAL EXPRESSION IN SPANISH USING SPEECH SYNTHESIS TECHNIQUES , 2000 .

[22]  Juan Manuel Montero-Martínez,et al.  Emotional speech synthesis: from speech database to TTS , 1998, ICSLP.

[23]  S. N. Sivanandam,et al.  Principles of soft computing , 2011 .