Exploiting Psychological Factors for Interaction Style Recognition in Spoken Conversation

Determining how a speaker is engaged in a conversation is crucial for achieving harmonious interaction between computers and humans. In this study, a fusion approach was developed based on psychological factors to recognize Interaction Style ( IS) in spoken conversation, which plays a key role in creating natural dialogue agents. The proposed Fused Cross-Correlation Model (FCCM) provides a unified probabilistic framework to model the relationships among the psychological factors of emotion, personality trait ( PT), transient IS, and IS history, for recognizing IS. An emotional arousal-dependent speech recognizer was used to obtain the recognized spoken text for extracting linguistic features to estimate transient IS likelihood and recognize PT. A temporal course modeling approach and an emotional sub-state language model, based on the temporal phases of an emotional expression, were employed to obtain a better emotion recognition result. The experimental results indicate that the proposed FCCM yields satisfactory results in IS recognition and also demonstrate that combining psychological factors effectively improves IS recognition accuracy.

[1]  Janet E. Cahn,et al.  Improvising Linguistic Style: Social and Affective Bases for Agents. , 1997 .

[2]  Angeliki Metallinou,et al.  Speaker Personality Classification Using Systems Based on Acoustic-Lexical Cues and an Optimal Tree-Structured Bayesian Network , 2012, INTERSPEECH.

[3]  Yan Huang Support Vector Machines for Text Categorization Based on Latent Semantic Indexing , 2003 .

[4]  Marilyn A. Walker,et al.  Using Linguistic Cues for the Automatic Recognition of Personality in Conversation and Text , 2007, J. Artif. Intell. Res..

[5]  T. Dalgleish,et al.  Handbook of cognition and emotion , 1999 .

[6]  Chun Chen,et al.  A robust multimodal approach for emotion recognition , 2008, Neurocomputing.

[7]  Ron Artstein,et al.  Survey Article: Inter-Coder Agreement for Computational Linguistics , 2008, CL.

[8]  Jean Carletta,et al.  Assessing Agreement on Classification Tasks: The Kappa Statistic , 1996, CL.

[9]  Chung-Hsien Wu,et al.  Emotion Recognition of Affective Speech Based on Multiple Classifiers Using Acoustic-Prosodic Information and Semantic Labels , 2015, IEEE Transactions on Affective Computing.

[10]  L. Hedges,et al.  The Handbook of Research Synthesis and Meta-Analysis , 2009 .

[11]  Chung-Hsien Wu,et al.  Two-Level Hierarchical Alignment for Semi-Coupled HMM-Based Audiovisual Emotion Recognition With Temporal Course , 2013, IEEE Transactions on Multimedia.

[12]  Ruili Wang,et al.  Ensemble methods for spoken emotion recognition in call-centres , 2007, Speech Commun..

[13]  Francisco Iacobelli,et al.  Large Scale Personality Classification of Bloggers , 2011, ACII.

[14]  C. Allen Dimensions of Personality , 1947, Nature.

[15]  H. Keselman,et al.  Multiple Comparison Procedures , 2005 .

[16]  Steve Young,et al.  The HTK book version 3.4 , 2006 .

[17]  Rosalind W. Picard Affective Computing , 1997 .

[18]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[19]  Chung-Hsien Wu,et al.  Interactional Style Detection for Versatile Dialogue Response Using Prosodic and Semantic Features , 2011, INTERSPEECH.

[20]  John A. Johnson,et al.  The international personality item pool and the future of public-domain personality measures ☆ , 2006 .

[21]  Ragini Verma,et al.  Class-level spectral features for emotion recognition , 2010, Speech Commun..

[22]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[23]  Daniel Jurafsky,et al.  Extracting Social Meaning: Identifying Interactional Style in Spoken Conversation , 2009, NAACL.

[24]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[25]  Chung-Hsien Wu,et al.  Emotion recognition from multi-modal information , 2013, 2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference.

[26]  Fakhri Karray,et al.  Survey on speech emotion recognition: Features, classification schemes, and databases , 2011, Pattern Recognit..

[27]  Marilyn A. Walker,et al.  Improvising linguistic style: social and affective bases for agent personality , 1997, AGENTS '97.

[28]  Björn W. Schuller,et al.  Timing levels in segment-based speech emotion recognition , 2006, INTERSPEECH.

[29]  Nikos Fakotakis,et al.  Modeling the Temporal Evolution of Acoustic Parameters for Speech Emotion Recognition , 2012, IEEE Transactions on Affective Computing.

[30]  Alex Pentland,et al.  Socially aware, computation and communication , 2005, Computer.

[31]  R. Thayer The biopsychology of mood and arousal , 1989 .

[32]  Björn W. Schuller,et al.  Hidden Markov model-based speech emotion recognition , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[33]  A. Pentland Social Dynamics: Signals and Behavior , 2004 .

[34]  Maja Pantic,et al.  Fully Automatic Recognition of the Temporal Phases of Facial Actions , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[35]  Rui Xia,et al.  Sentence level emotion recognition based on decisions from subsentence segments , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[36]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[37]  Keh-Jiann Chen,et al.  Design of CKIP Chinese Word Segmentation System , 2004, J. Chin. Lang. Comput..

[38]  Chung-Hsien Wu,et al.  Error Weighted Semi-Coupled Hidden Markov Model for Audio-Visual Emotion Recognition , 2012, IEEE Transactions on Multimedia.

[39]  George N. Votsis,et al.  Emotion recognition in human-computer interaction , 2001, IEEE Signal Process. Mag..

[40]  Maja Pantic,et al.  Social signal processing: Survey of an emerging domain , 2009, Image Vis. Comput..

[41]  Ling Guan,et al.  Kernel Cross-Modal Factor Analysis for Information Fusion With Application to Bimodal Emotion Recognition , 2012, IEEE Transactions on Multimedia.

[42]  WuChung-Hsien,et al.  Emotion recognition from text using semantic labels and separable mixture models , 2006 .