Speech activity detection and automatic prosodic processing unit segmentation for emotion recognition

In speech communication emotions play a great role in expressing information. These emotions are partly given as reactions to our environment, to our partners during a conversation. Understanding these reactions and recognizing them automatically is highly important. Through them, we can get a clearer picture of the response of our partner in a conversation. In Cognitive InfoCommunication this kind of information helps us to develop robots, devices that are more aware of the need of the user, making the device easy and enjoyable to use. In our laboratory we conducted automatic emotion classification and speech segmentation experiments. In order to develop an automatic emotion recognition system on the basis of speech, an automatic speech segmenter is also needed to separate the speech segments needed for the emotion analysis. In our former research we found that the intonational phrase can be a proper unit of emotion analysis. In this paper speech detection and segmentation methods are developed. For speech detection, Hidden Markov Models are used with various noise and speech acoustic models. The results show that the procedure is able to detect speech in the sound signal with more than 91% accuracy and segment it into intonational phrases.

[1]  Hisham Othman,et al.  A semi-continuous state transition probability HMM-based voice activity detection , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Martin T Rothman,et al.  Getting to the heart of the matter , 1999, Annals of the rheumatic diseases.

[3]  Alan Bundy,et al.  Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence - IJCAI-95 , 1995 .

[4]  Björn W. Schuller,et al.  Segmenting into Adequate Units for Automatic Recognition of Emotion-Related Episodes: A Speech-Based Approach , 2010, Adv. Hum. Comput. Interact..

[5]  Hiok Chai Quek,et al.  Cultural dependency analysis for understanding speech emotion , 2012, Expert Syst. Appl..

[6]  György Szaszák,et al.  Using prosody to improve automatic speech recognition , 2010, Speech Commun..

[7]  László Hunyadi,et al.  Hungarian Sentence Prosody and Universal Grammar: On the Phonology – Syntax Interface , 2002 .

[8]  B. Schuller,et al.  Robust Acoustic Speech Emotion Recognition by Ensembles of Classifiers , 2005 .

[9]  Tuomas Virtanen,et al.  Acoustic event detection in real life recordings , 2010, 2010 18th European Signal Processing Conference.

[10]  Anna Esposito,et al.  The Perceptual and Cognitive Role of Visual and Auditory Channels in Conveying Emotional Information , 2009, Cognitive Computation.

[11]  Elisabeth Selkirk,et al.  The Syntax‐Phonology Interface , 2011 .

[12]  Zdravko Kacic,et al.  A rule-based emotion-dependent feature extraction method for emotion analysis from speech. , 2006, The Journal of the Acoustical Society of America.

[13]  A. Leask Getting to the heart of the matter: new insights into cardiac fibrosis. , 2015, Circulation research.

[14]  Jiqing Han,et al.  A modified MAP criterion based on hidden Markov model for voice activity detecion , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]  Fakhri Karray,et al.  Survey on speech emotion recognition: Features, classification schemes, and databases , 2011, Pattern Recognit..

[16]  Thomas S. Huang,et al.  Real-world acoustic event detection , 2010, Pattern Recognit. Lett..

[17]  Klára Vicsi,et al.  Problems of the Automatic Emotion Recognitions in Spontaneous Speech; An Example for the Recognition in a Dispatcher Center , 2010, COST 2102 Training School.

[18]  Theodoros Kostoulas,et al.  Study on Speaker-Independent Emotion Recognition from Speech on Real-World Data , 2007, COST 2102 Workshop.

[19]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[20]  Lori Lamel,et al.  Challenges in real-life emotion annotation and machine learning based detection , 2005, Neural Networks.

[21]  Nick Campbell,et al.  Individual Traits of Speaking Style and Speech Rhythm in a Spoken Discourse , 2008, COST 2102 Workshop.

[22]  K. Scherer,et al.  Emotion Inferences from Vocal Expression Correlate Across Languages and Cultures , 2001 .

[24]  Inma Hernáez,et al.  An objective and subjective study of the role of semantics and prosodic features in building corpora for emotional TTS , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[25]  Basilio Sierra,et al.  Application of Feature Subset Selection Based on Evolutionary Algorithms for Automatic Emotion Recognition in Speech , 2007, NOLISP.

[26]  George N. Votsis,et al.  Emotion recognition in human-computer interaction , 2001, IEEE Signal Process. Mag..

[27]  Klára Vicsi,et al.  Speech Emotion Perception by Human and Machine , 2008, COST 2102 Workshop.

[28]  C.-C. Jay Kuo,et al.  Audio content analysis for online audiovisual data segmentation and classification , 2001, IEEE Trans. Speech Audio Process..

[29]  J. A. Edwards,et al.  Talking data : transcription and coding in discourse research , 1995 .

[30]  Roddy Cowie,et al.  Emotional speech: Towards a new generation of databases , 2003, Speech Commun..

[31]  Tapio Seppänen,et al.  Prosody-based classification of emotions in spoken finnish , 2003, INTERSPEECH.

[32]  Elisabeth André,et al.  An Evaluation of Emotion Units and Feature Types for Real-Time Speech Emotion Recognition , 2011, KI - Künstliche Intelligenz.

[33]  Astrid Paeschke,et al.  A database of German emotional speech , 2005, INTERSPEECH.

[34]  György Szaszák,et al.  Using prosody for the improvement of ASR - sentence modality recognition , 2008, INTERSPEECH.

[35]  Klára Vicsi,et al.  Recognition of Emotions on the Basis of Different Levels of Speech Segments , 2012, J. Adv. Comput. Intell. Intell. Informatics.

[36]  Christopher Raphael,et al.  Automatic Segmentation of Acoustic Musical Signals Using Hidden Markov Models , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[37]  N. Thrift International Encyclopaedia of the Social and Behavioural Sciences , 2002 .