Classifier-based learning of nonlinear feature manifold for visualization of emotional speech prosody

Visualization of emotional speech data is an important tool for speech researchers who seek means to gain a deeper insight into the structure of complex multidimensional data. A visualization method is presented that utilizes feature selection and classifier optimization for learning Isomap manifolds of emotional speech data. The resulting manifold is based on those features that best discriminate between given emotional classes in the target space of specified embedding dimension. A nonlinear mapping function based on generalized regression neural networks (GRNNs) provides generalization for new data. A low-dimensional manifold of emotional speech data consisting of neutral, sad, angry, and happy expressions was constructed using prosodic and acoustic features of speech. Experimental results indicate that a 3D embedding provides the best classification performance. The manifold structure can be readily visualized and matches the circumplex and conical shapes predicted by dimensional models of emotion. Listening tests show excellent correlation between the organization of the data on the manifold and the listeners' judgment of emotional intensity.

[1]  Louis ten Bosch,et al.  Emotions, speech and the ASR framework , 2003, Speech Commun..

[2]  Andreas Stolcke,et al.  Prosody-based automatic detection of annoyance and frustration in human-computer dialog , 2002, INTERSPEECH.

[3]  K. Scherer,et al.  Acoustic profiles in vocal emotion expression. , 1996, Journal of personality and social psychology.

[4]  Hatice Gunes,et al.  Continuous Prediction of Spontaneous Affect from Multiple Cues and Modalities in Valence-Arousal Space , 2011, IEEE Transactions on Affective Computing.

[5]  Marc Schröder,et al.  Emotional speech synthesis: a review , 2001, INTERSPEECH.

[6]  Mann Oo. Hay Emotion recognition in human-computer interaction , 2012 .

[7]  Chun Chen,et al.  Emotional Speech Analysis on Nonlinear Manifold , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[8]  Zhihong Zeng,et al.  A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions , 2009, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  K. Scherer,et al.  Emotion Inferences from Vocal Expression Correlate Across Languages and Cultures , 2001 .

[10]  Shrikanth S. Narayanan,et al.  An exploratory study of manifolds of emotional speech , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[11]  Roddy Cowie,et al.  Acoustic correlates of emotion dimensions in view of speech synthesis , 2001, INTERSPEECH.

[12]  Zhi-Hua Zhou,et al.  Supervised nonlinear dimensionality reduction for visualization and classification , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[13]  Tapio Seppänen,et al.  MediaTeam Speech Corpus : a first large Finnish emotional speech database , 2003 .

[14]  Rafael A. Calvo,et al.  Affect Detection: An Interdisciplinary Review of Models, Methods, and Their Applications , 2010, IEEE Transactions on Affective Computing.

[15]  Klaus R. Scherer,et al.  Vocal communication of emotion: A review of research paradigms , 2003, Speech Commun..

[16]  Constantine Kotropoulos,et al.  Emotional speech recognition: Resources, features, and methods , 2006, Speech Commun..

[17]  J. Russell A circumplex model of affect. , 1980 .

[18]  P. Alku,et al.  Physical variations related to stress and emotional state: A preliminary study. , 1996 .

[19]  Tapio Seppänen,et al.  Classification of emotion in spoken Finnish using vowel-length segments: Increasing reliability with a fusion technique , 2011, Speech Commun..

[20]  Zhihong Zeng,et al.  Audio-visual affect recognition in activation-evaluation space , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[21]  Tapio Seppänen,et al.  Automatic Discrimination of Emotion from Spoken Finnish , 2004, Language and speech.

[22]  Roddy Cowie,et al.  Emotional speech: Towards a new generation of databases , 2003, Speech Commun..

[23]  Olli Silven,et al.  Comparison of dimensionality reduction methods for wood surface inspection , 2003, International Conference on Quality Control by Artificial Vision.

[24]  Frank Dellaert,et al.  Recognizing emotion in speech , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[25]  Lawrence K. Saul,et al.  Exploratory analysis and visualization of speech and music by locally linear embedding , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[26]  Lemin Li,et al.  Speech emotion recognition based on supervised locally linear embedding , 2010, 2010 International Conference on Communications, Circuits and Systems (ICCCAS).

[27]  Joshua B. Tenenbaum,et al.  The Isomap Algorithm and Topological Stability , 2002, Science.

[28]  Dimitrios Gunopulos,et al.  Non-linear dimensionality reduction techniques for classification and visualization , 2002, KDD.

[29]  C. Izard Basic emotions, relations among emotions, and emotion-cognition relations. , 1992, Psychological review.

[30]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[31]  Paavo Alku,et al.  On the perception of emotions in speech: the role of voice quality , 1997 .

[32]  Klaus R. Scherer,et al.  Vocal communication of emotion , 2000 .

[33]  John L. Arnott,et al.  Implementation and testing of a system for producing emotion-by-rule in synthetic speech , 1995, Speech Commun..

[34]  Yuefen Chen,et al.  KIsomap-based feature extraction for spoken emotion recognition , 2010, IEEE 10th INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS.

[35]  P. Ekman An argument for basic emotions , 1992 .

[36]  Katsuhiko Sakaue,et al.  Head pose estimation by nonlinear manifold learning , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[37]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[38]  K. Fischer,et al.  DESPERATELY SEEKING EMOTIONS OR: ACTORS, WIZARDS, AND HUMAN BEINGS , 2000 .

[39]  J. Russell,et al.  Evidence for a three-factor theory of emotions , 1977 .

[40]  Matti Pietikäinen,et al.  Visual Characterization of Paper Using Isomap and Local Binary Patterns , 2005, MVA.

[41]  Josef Kittler,et al.  Floating search methods in feature selection , 1994, Pattern Recognit. Lett..

[42]  Pierre-Yves Oudeyer,et al.  Novel Useful Features and Algorithms for the Recognition of Emotions in Human Speech , 2002 .

[43]  Mia Hubert,et al.  ROBPCA: A New Approach to Robust Principal Component Analysis , 2005, Technometrics.

[44]  C. Darwin The Expression of the Emotions in Man and Animals , .

[45]  T. Dalgleish Basic Emotions , 2004 .

[46]  Hee-Su Choi,et al.  Kernel Isomap , 2005 .

[47]  J. Russell,et al.  Core affect, prototypical emotional episodes, and other things called emotion: dissecting the elephant. , 1999, Journal of personality and social psychology.

[48]  J. Polivy,et al.  A conical model for the taxonomy of emotional experience. , 1983 .

[49]  Roddy Cowie,et al.  FEELTRACE: an instrument for recording perceived emotion in real time , 2000 .

[50]  Roddy Cowie,et al.  Describing the emotional states that are expressed in speech , 2003, Speech Commun..

[51]  Eric O. Postma,et al.  Dimensionality Reduction: A Comparative Review , 2008 .

[52]  Roddy Cowie,et al.  Automatic recognition of emotion from voice: a rough benchmark , 2000 .

[53]  Craig A. Smith,et al.  Patterns of cognitive appraisal in emotion. , 1985, Journal of personality and social psychology.

[54]  P. Laukka,et al.  A dimensional approach to vocal expression of emotion , 2005 .

[55]  K. Scherer Vocal affect expression: a review and a model for future research. , 1986, Psychological bulletin.

[56]  Y. Attikiouzel,et al.  Dimension and structure of the speech space , 1992 .

[57]  Jukka Kortelainen,et al.  Isomap Approach to EEG-Based Assessment of Neurophysiological Changes During Anesthesia , 2011, IEEE Transactions on Neural Systems and Rehabilitation Engineering.

[58]  Wenbo Xu,et al.  Weighted Kernel Isomap for Data Visualization and Pattern Classification , 2006, CIS.