Inferring Depression and Affect from Application Dependent Meta Knowledge

This paper outlines our contribution to the 2014 edition of the AVEC competition. It comprises classification results and considerations for both the continuous affect recognition sub-challenge and also the depression recognition sub-challenge. Rather than relying on statistical features that are normally extracted from the raw audio-visual data we propose an approach based on abstract meta information about individual subjects and also prototypical task and label dependent templates to infer the respective emotional states. The results of the approach that were submitted to both parts of the challenge significantly outperformed the baseline approaches. Further, we elaborate on several issues about the labeling of affective corpora and the choice of appropriate performance measures.

[1]  Matej Rojc,et al.  Coverbal Synchrony in Human-Machine Interaction , 2013 .

[2]  J. Lönnqvist,et al.  Posing personality: Is it possible to enact the Big Five traits in photographs? , 2013 .

[3]  Björn W. Schuller,et al.  AVEC 2011-The First International Audio/Visual Emotion Challenge , 2011, ACII.

[4]  Gregor Bertrand,et al.  Companion-Technology: Towards User- and Situation-Adaptive Functionality of Technical Systems , 2014, 2014 International Conference on Intelligent Environments.

[5]  Patrick Thiam,et al.  Majority-Class Aware Support Vector Domain Oversampling for Imbalanced Classification Problems , 2014, ANNPR.

[6]  J. Allwood A Framework for Studying Human Multimodal Communication , 2013 .

[7]  Sascha Meudt,et al.  Multi-Modal Classifier-Fusion for the Recognition of Emotions , 2013 .

[8]  Ludmila I. Kuncheva,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2004 .

[9]  Friedhelm Schwenker,et al.  Conditioned Hidden Markov Model Fusion for Multimodal Classification , 2011, INTERSPEECH.

[10]  Friedhelm Schwenker,et al.  Training of multiple classifier systems utilizing partially labeled sequential data sets , 2011, ESANN.

[11]  Thierry Pun,et al.  DEAP: A Database for Emotion Analysis ;Using Physiological Signals , 2012, IEEE Transactions on Affective Computing.

[12]  Sascha Meudt,et al.  Multi classifier systems and forward backward feature selection algorithms to classify emotional coloured speech , 2013, ICMI '13.

[13]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[14]  Hynek Hermansky,et al.  RASTA-PLP speech analysis technique , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[15]  Günther Palm,et al.  Revisiting AVEC 2011 - An Information Fusion Architecture , 2012, WIRN.

[16]  Abeer Alwan,et al.  Joint Robust Voicing Detection and Pitch Estimation Based on Residual Harmonics , 2019, INTERSPEECH.

[17]  Patrick Thiam,et al.  On Annotation and Evaluation of Multi-modal Corpora in Affective Human-Computer Interaction , 2014, MA3HMI@INTERSPEECH.

[18]  Günther Palm,et al.  Multi-modal Fusion based on classifiers using reject options and Markov Fusion Networks , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[19]  Jonghwa Kim,et al.  Transsituational Individual-Specific Biopsychological Classification of Emotions , 2013, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[20]  Friedhelm Schwenker,et al.  Kalman Filter Based Classifier Fusion for Affective State Recognition , 2013, MCS.

[21]  Friedhelm Schwenker,et al.  A Hidden Markov Model Based Approach for Facial Expression Recognition in Image Sequences , 2010, ANNPR.

[22]  Friedhelm Schwenker,et al.  Studying Self- and Active-Training Methods for Multi-feature Set Emotion Recognition , 2011, PSL.

[23]  Harald Haas,et al.  Harnessing Nonlinearity: Predicting Chaotic Systems and Saving Energy in Wireless Communication , 2004, Science.

[24]  Günther Palm,et al.  Wizard-of-Oz Data Collection for Perception and Interaction in Multi-User Environments , 2006, LREC.

[25]  Stephan Tschechne,et al.  A Biologically Inspired Model for the Detection of External and Internal Head Motions , 2013, ICANN.

[26]  Günther Palm,et al.  Multiple classifier combination using reject options and markov fusion networks , 2012, ICMI '12.

[27]  Friedhelm Schwenker,et al.  Multimodal Emotion Classification in Naturalistic User Behavior , 2011, HCI.

[28]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[29]  Günther Palm,et al.  Spotting laughter in natural multiparty conversations: A comparison of automatic online and offline approaches using audiovisual data , 2012, TIIS.

[30]  Günther Palm,et al.  On the discovery of events in EEG data utilizing information fusion , 2013, Comput. Stat..

[31]  Dietmar F. Rösner,et al.  LAST MINUTE: a Multimodal Corpus of Speech-based User-Companion Interactions , 2012, LREC.

[32]  Günther Palm,et al.  Semi-supervised Facial Expressions Annotation Using Co-Training with Fast Probabilistic Tri-Class SVMs , 2010, ICANN.

[33]  JOSEPH ZILBER,et al.  issue 1 , 2020, JORDANIAN JOURNAL OF ENGINEERING AND CHEMICAL INDUSTRIES (JJECI).

[34]  Markus Kächele,et al.  Classification of Emotional States in a Woz Scenario Exploiting Labeled and Unlabeled Bio-physiological Data , 2011, PSL.

[35]  Markus Kächele,et al.  Semi-Supervised Dictionary Learning of Sparse Representations for Emotion Recognition , 2013, PSL.

[36]  Markus Kächele,et al.  Cascaded Fusion of Dynamic, Spatial, and Textural Feature Sets for Person-Independent Facial Emotion Recognition , 2014, 2014 22nd International Conference on Pattern Recognition.

[37]  Björn W. Schuller,et al.  AVEC 2014: 3D Dimensional Affect and Depression Recognition Challenge , 2014, AVEC '14.

[38]  Sascha Meudt,et al.  Fusion of Audio-visual Features using Hierarchical Classifier Systems for the Recognition of Affective States and the State of Depression , 2014, ICPRAM.

[39]  Subhash C. Bagui,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2005, Technometrics.

[40]  Friedhelm Schwenker,et al.  Pattern classification and clustering: A review of partially supervised learning approaches , 2014, Pattern Recognit. Lett..

[41]  Markus Kächele,et al.  Using unlabeled data to improve classification of emotional states in human computer interaction , 2013, Journal on Multimodal User Interfaces.

[42]  J. Russell A circumplex model of affect. , 1980 .

[43]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[44]  Elisabeth André,et al.  Emotion recognition based on physiological changes in music listening , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[45]  Ryohei Nakatsu,et al.  Emotion Recognition in Speech Using Neural Networks , 2000, Neural Computing & Applications.

[46]  K. Scherer,et al.  Vocal expression of emotion. , 2003 .

[47]  Günther Palm,et al.  Combination of sequential class distributions from multiple channels using Markov fusion networks , 2014, Journal on Multimodal User Interfaces.

[48]  Sepp Hochreiter,et al.  The Vanishing Gradient Problem During Learning Recurrent Neural Nets and Problem Solutions , 1998, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[49]  Fabien Ringeval,et al.  Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[50]  Günther Palm,et al.  A generic framework for the inference of user states in human computer interaction , 2012, Journal on Multimodal User Interfaces.

[51]  Sascha Meudt,et al.  On Instance Selection in Audio Based Emotion Recognition , 2012, ANNPR.

[52]  Björn W. Schuller,et al.  LSTM-Modeling of continuous emotions in an audiovisual affect recognition framework , 2013, Image Vis. Comput..

[53]  E. Vesterinen,et al.  Affective Computing , 2009, Encyclopedia of Biometrics.

[54]  Mann Oo. Hay Emotion recognition in human-computer interaction , 2012 .

[55]  Sascha Meudt,et al.  Prosodic, Spectral and Voice Quality Feature Selection Using a Long-Term Stopping Criterion for Audio-Based Emotion Recognition , 2014, 2014 22nd International Conference on Pattern Recognition.

[56]  Markus Kächele,et al.  Multiple Classifier Systems for the Classification of Audio-Visual Emotional States , 2011, ACII.

[57]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[58]  Günther Palm,et al.  Recognizing Human Activities Using a Layered Markov Architecture , 2012, ICANN.

[59]  G. Palm,et al.  Learning of Decision Fusion Mappings for Pattern Recognition , 2006 .

[60]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[61]  Mohammad Soleymani,et al.  A Multimodal Database for Affect Recognition and Implicit Tagging , 2012, IEEE Transactions on Affective Computing.

[62]  Sascha Meudt,et al.  A New Multi-class Fuzzy Support Vector Machine Algorithm , 2014, ANNPR.

[63]  Albino Nogueiras,et al.  Speech emotion recognition using hidden Markov models , 2001, INTERSPEECH.