A Review of Generalizable Transfer Learning in Automatic Emotion Recognition

Automatic emotion recognition is the process of identifying human emotion from signals such as facial expression, speech, and text. Collecting and labeling such signals is often tedious and many times requires expert knowledge. An effective way to address challenges related to the scarcity of data and lack of human labels, is transfer learning. In this manuscript, we will describe fundamental concepts in the field of transfer learning and review work which has successfully applied transfer learning for automatic emotion recognition. We will finally discuss promising future research directions of transfer learning for improving the generalizability of automatic emotion recognition systems.

[1]  T. Jung,et al.  Improving EEG-Based Emotion Classification Using Conditional Transfer Learning , 2017, Front. Hum. Neurosci..

[2]  Stefan Steidl,et al.  Automatic classification of emotion related user states in spontaneous children's speech , 2009 .

[3]  Tamás D. Gedeon,et al.  Collecting Large, Richly Annotated Facial-Expression Databases from Movies , 2012, IEEE MultiMedia.

[4]  Björn W. Schuller,et al.  Context-sensitive multimodal emotion recognition from speech and facial expression using bidirectional LSTM modeling , 2010, INTERSPEECH.

[5]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[6]  Dan Liu,et al.  A Fast, Efficient Domain Adaptation Technique for Cross-Domain Electroencephalography(EEG)-Based Emotion Recognition , 2017, Sensors.

[7]  William-Chandra Tjhi,et al.  Dual Fuzzy-Possibilistic Co-clustering for Document Categorization , 2007 .

[8]  Tamás D. Gedeon,et al.  Video and Image based Emotion Recognition Challenges in the Wild: EmotiW 2015 , 2015, ICMI.

[9]  Trevor Darrell,et al.  What you saw is not what you get: Domain adaptation using asymmetric kernel transforms , 2011, CVPR 2011.

[10]  Bao-Liang Lu,et al.  Transfer components between subjects for EEG-based emotion recognition , 2015, 2015 International Conference on Affective Computing and Intelligent Interaction (ACII).

[11]  Carlos Busso,et al.  IEMOCAP: interactive emotional dyadic motion capture database , 2008, Lang. Resour. Evaluation.

[12]  Fabio Valente,et al.  The INTERSPEECH 2013 computational paralinguistics challenge: social signals, conflict, emotion, autism , 2013, INTERSPEECH.

[13]  Mengjie Zhang,et al.  Domain Adaptive Neural Networks for Object Recognition , 2014, PRICAI.

[14]  Björn W. Schuller,et al.  Linked Source and Target Domain Subspace Feature Transfer Learning -- Exemplified by Speech Emotion Recognition , 2014, 2014 22nd International Conference on Pattern Recognition.

[15]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[16]  Daniel Marcu,et al.  Domain Adaptation for Statistical Classifiers , 2006, J. Artif. Intell. Res..

[17]  Erik Marchi,et al.  Sparse Autoencoder-Based Feature Transfer Learning for Speech Emotion Recognition , 2013, 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction.

[18]  Boyang Li,et al.  Video Emotion Recognition with Transferred Deep Feature Encodings , 2016, ICMR.

[19]  peng song,et al.  Transfer Linear Subspace Learning for Cross-Corpus Speech Emotion Recognition , 2019, IEEE Transactions on Affective Computing.

[20]  Emily Mower Provost,et al.  Improving Cross-Corpus Speech Emotion Recognition with Adversarial Discriminative Domain Generalization (ADDoG) , 2019, IEEE Transactions on Affective Computing.

[21]  Dong-Yan Huang,et al.  Audio-visual emotion recognition using deep transfer learning and multiple temporal models , 2017, ICMI.

[22]  Michael I. Jordan,et al.  Learning Transferable Features with Deep Adaptation Networks , 2015, ICML.

[23]  Carlos Busso,et al.  Domain Adversarial for Acoustic Emotion Recognition , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[24]  Siddique Latif,et al.  Unsupervised Adversarial Domain Adaptation for Cross-Lingual Speech Emotion Recognition , 2019, 2019 8th International Conference on Affective Computing and Intelligent Interaction (ACII).

[25]  Ping Lu,et al.  Audio-visual emotion fusion (AVEF): A deep efficient weighted approach , 2019, Inf. Fusion.

[26]  Luc Van Gool,et al.  Face Detection without Bells and Whistles , 2014, ECCV.

[27]  Jianmin Wang,et al.  Partial Transfer Learning with Selective Adversarial Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[28]  Ragini Verma,et al.  CREMA-D: Crowd-Sourced Emotional Multimodal Actors Dataset , 2014, IEEE Transactions on Affective Computing.

[29]  Liang Lin,et al.  Deep Cocktail Network: Multi-source Unsupervised Domain Adaptation with Category Shift , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[30]  Mohammad H. Mahoor,et al.  AffectNet: A Database for Facial Expression, Valence, and Arousal Computing in the Wild , 2017, IEEE Transactions on Affective Computing.

[31]  Rajib Rana,et al.  Cross Corpus Speech Emotion Classification- An Effective Transfer Learning Technique , 2018, ArXiv.

[32]  Björn W. Schuller,et al.  Cross lingual speech emotion recognition using canonical correlation analysis on principal component subspace , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[33]  Bao-Liang Lu,et al.  Investigating Critical Frequency Bands and Channels for EEG-Based Emotion Recognition with Deep Neural Networks , 2015, IEEE Transactions on Autonomous Mental Development.

[34]  Astrid Paeschke,et al.  A database of German emotional speech , 2005, INTERSPEECH.

[35]  A. W. Siegman,et al.  Voices of fear and anxiety and sadness and depression: the effects of speech rate and loudness on fear and anxiety and sadness and depression. , 1993, Journal of abnormal psychology.

[36]  A. Lynn Abbott,et al.  Facial Emotion Recognition with Varying Poses and/or Partial Occlusion Using Multi-stage Progressive Transfer Learning , 2019, SCIA.

[37]  A. Lynn Abbott,et al.  VT-KFER: A Kinect-based RGBD+time dataset for spontaneous and non-spontaneous facial expression recognition , 2015, 2015 International Conference on Biometrics (ICB).

[38]  Björn Schuller,et al.  Recognizing Emotions From Whispered Speech Based on Acoustic Feature Transfer Learning , 2017, IEEE Access.

[39]  Barbara Caputo,et al.  Learning the Roots of Visual Domain Shift , 2016, ECCV Workshops.

[40]  Shrikanth S. Narayanan,et al.  The Vera am Mittag German audio-visual emotional speech database , 2008, 2008 IEEE International Conference on Multimedia and Expo.

[41]  Yun Fu,et al.  Multi-source Transfer Learning , 2018, Learning Representation for Multi-View Data Analysis.

[42]  Yongzhao Zhan,et al.  Domain adaptation for speech emotion recognition by sharing priors between related source and target classes , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[43]  Stefanos Zafeiriou,et al.  300 Faces In-The-Wild Challenge: database and results , 2016, Image Vis. Comput..

[44]  Stefan Winkler,et al.  Deep Learning for Emotion Recognition on Small Datasets using Transfer Learning , 2015, ICMI.

[45]  Michael J. Lyons,et al.  Coding facial expressions with Gabor wavelets , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[46]  Ivor W. Tsang,et al.  Domain Adaptation via Transfer Component Analysis , 2009, IEEE Transactions on Neural Networks.

[47]  Stylianos Asteriadis,et al.  Audio-visual domain adaptation using conditional semi-supervised Generative Adversarial Networks , 2020, Neurocomputing.

[48]  Wojciech Majewski,et al.  Polish Emotional Speech Database - Recording and Preliminary Validation , 2009, COST 2102 Conference.

[49]  Ruchuan Wang,et al.  Speech emotion recognition based on hierarchical attributes using feature nets , 2020, Int. J. Parallel Emergent Distributed Syst..

[50]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[51]  Fabien Ringeval,et al.  Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[52]  Jesse Hoey,et al.  EmotiW 2016: video and group-level emotion recognition challenges , 2016, ICMI.

[53]  Xiao Zhang,et al.  Finding Celebrities in Billions of Web Images , 2012, IEEE Transactions on Multimedia.

[54]  Tong Zhang,et al.  Cross-Corpus Speech Emotion Recognition Based on Domain-Adaptive Least-Squares Regression , 2016, IEEE Signal Processing Letters.

[55]  Ke Chen,et al.  Transferable Positive/negative Speech Emotion Recognition via Class-wise Adversarial Domain Adaptation , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[56]  Albert Ali Salah,et al.  Video-based emotion recognition in the wild using deep transfer learning and score fusion , 2017, Image Vis. Comput..

[57]  Björn W. Schuller,et al.  The INTERSPEECH 2009 emotion challenge , 2009, INTERSPEECH.

[58]  Donald A. Adjeroh,et al.  Information Bottleneck Learning Using Privileged Information for Visual Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[59]  José M. F. Moura,et al.  Adversarial Multiple Source Domain Adaptation , 2018, NeurIPS.

[60]  François Laviolette,et al.  Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[61]  Jianmin Wang,et al.  Multi-Adversarial Domain Adaptation , 2018, AAAI.

[62]  Carlos Busso,et al.  Supervised domain adaptation for emotion recognition from speech , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[63]  Yahui Zhang,et al.  Cross-Subject EEG-Based Emotion Recognition with Deep Domain Confusion , 2019, ICIRA.

[64]  Dian Tjondronegoro,et al.  Cross-Domain Knowledge Transfer for Incremental Deep Learning in Facial Expression Recognition , 2019, 2019 7th International Conference on Robot Intelligence Technology and Applications (RiTA).

[65]  Qisong Wang,et al.  Unsupervised domain adaptation techniques based on auto-encoder for non-stationary EEG-based emotion recognition , 2016, Comput. Biol. Medicine.

[66]  Bao-Liang Lu,et al.  Personalizing EEG-Based Affective Models with Transfer Learning , 2016, IJCAI.

[67]  Wen Gao,et al.  Learning Affective Features With a Hybrid Deep Model for Audio–Visual Emotion Recognition , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[68]  Björn W. Schuller,et al.  Introducing shared-hidden-layer autoencoders for transfer learning and their application in acoustic emotion recognition , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[69]  Ivor W. Tsang,et al.  Domain Transfer SVM for video concept detection , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[70]  Carlos Busso,et al.  MSP-IMPROV: An Acted Corpus of Dyadic Interactions to Study Emotion Perception , 2017, IEEE Transactions on Affective Computing.

[71]  Donald A. Adjeroh,et al.  Unified Deep Supervised Domain Adaptation and Generalization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[72]  Pascal Fua,et al.  Non-Linear Domain Adaptation with Boosting , 2013, NIPS.

[73]  Thierry Pun,et al.  DEAP: A Database for Emotion Analysis ;Using Physiological Signals , 2012, IEEE Transactions on Affective Computing.

[74]  Patrick Cardinal,et al.  Emotion Recognition Using Fusion of Audio and Video Features , 2019, 2019 IEEE International Conference on Systems, Man and Cybernetics (SMC).

[75]  George Trigeorgis,et al.  Domain Separation Networks , 2016, NIPS.

[76]  Maja Pantic,et al.  This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON AFFECTIVE COMPUTING , 2022 .

[77]  Xiangyang Xue,et al.  Predicting Emotions in User-Generated Videos , 2014, AAAI.

[78]  Junmo Kim,et al.  Less-forgetful Learning for Domain Expansion in Deep Neural Networks , 2017, AAAI.

[79]  Theodora Chaspari,et al.  Exploring Transfer Learning between Scripted and Spontaneous Speech for Emotion Recognition , 2019, ICMI.

[80]  Zhigang Deng,et al.  Analysis of emotion recognition using facial expressions, speech and multimodal information , 2004, ICMI '04.

[81]  Ioannis Pitas,et al.  The eNTERFACE’05 Audio-Visual Emotion Database , 2006, 22nd International Conference on Data Engineering Workshops (ICDEW'06).

[82]  Huiguang He,et al.  Multisource Transfer Learning for Cross-Subject EEG Emotion Recognition , 2020, IEEE Transactions on Cybernetics.

[83]  Emily Mower Provost,et al.  Progressive Neural Networks for Transfer Learning in Emotion Recognition , 2017, INTERSPEECH.

[84]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[85]  Yuan-Pin Lin,et al.  Exploring Day-to-Day Variability in the Relations Between Emotion and EEG Signals , 2015, HCI.

[86]  Yuan-Pin Lin,et al.  Constructing a Personalized Cross-Day EEG-Based Emotion-Classification Model Using Transfer Learning , 2020, IEEE Journal of Biomedical and Health Informatics.

[87]  Rong Yan,et al.  Adapting SVM Classifiers to Data with Shifted Distributions , 2007, Seventh IEEE International Conference on Data Mining Workshops (ICDMW 2007).

[88]  Brian C. Lovell,et al.  Unsupervised Domain Adaptation by Domain Invariant Projection , 2013, 2013 IEEE International Conference on Computer Vision.

[89]  Carlos Busso,et al.  Building Naturalistic Emotionally Balanced Speech Corpus by Retrieving Emotional Speech from Existing Podcast Recordings , 2019, IEEE Transactions on Affective Computing.

[90]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[91]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[92]  Gernot R. Müller-Putz,et al.  Domain Adaptation Techniques for EEG-Based Emotion Recognition: A Comparative Study on Two Public Datasets , 2019, IEEE Transactions on Cognitive and Developmental Systems.

[93]  Li Ma,et al.  Facial expression recognition based on transfer learning from deep convolutional networks , 2015, 2015 11th International Conference on Natural Computation (ICNC).

[94]  Andrew Zisserman,et al.  Tabula rasa: Model transfer for object category detection , 2011, 2011 International Conference on Computer Vision.

[95]  Yoshua Bengio,et al.  Learning deep representations by mutual information estimation and maximization , 2018, ICLR.

[96]  Rama Chellappa,et al.  Domain adaptation for object recognition: An unsupervised approach , 2011, 2011 International Conference on Computer Vision.

[97]  Stefan Scherer,et al.  Learning representations of emotional speech with deep convolutional generative adversarial networks , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[98]  Barbara Plank,et al.  Learning to select data for transfer learning with Bayesian Optimization , 2017, EMNLP.

[99]  Yuan-Pin Lin,et al.  EEG-Based Emotion Recognition in Music Listening , 2010, IEEE Transactions on Biomedical Engineering.

[100]  Felix Burkhardt,et al.  A Database of Age and Gender Annotated Telephone Speech , 2010, LREC.

[101]  Björn Schuller,et al.  Being bored? Recognising natural interest by extensive audiovisual integration for real-life application , 2009, Image Vis. Comput..

[102]  Björn W. Schuller,et al.  Autoencoder-based Unsupervised Domain Adaptation for Speech Emotion Recognition , 2014, IEEE Signal Processing Letters.

[103]  Thomas Fang Zheng,et al.  Transfer learning for speech and language processing , 2015, 2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA).

[104]  Philip S. Yu,et al.  Transfer Sparse Coding for Robust Image Representation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[105]  S. R. Livingstone,et al.  The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English , 2018, PloS one.

[106]  Andrew Zisserman,et al.  Deep Face Recognition , 2015, BMVC.

[107]  Charles D. Spielberger,et al.  THE EFFECTS OF THREAT OF SHOCK ON HEART RATE FOR SUBJECTS WHO DIFFER IN MANIFEST ANXIETY AND FEAR OF SHOCK , 1966 .

[108]  Michael I. Jordan,et al.  Deep Transfer Learning with Joint Adaptation Networks , 2016, ICML.

[109]  James D. Edge,et al.  Audio-visual feature selection and reduction for emotion classification , 2008, AVSP.

[110]  Andrew Zisserman,et al.  Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.

[111]  Giovanni Costantini,et al.  EMOVO Corpus: an Italian Emotional Speech Database , 2014, LREC.

[112]  Björn W. Schuller,et al.  The INTERSPEECH 2010 paralinguistic challenge , 2010, INTERSPEECH.

[113]  Yuan Shi,et al.  Geodesic flow kernel for unsupervised domain adaptation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[114]  PanticMaja,et al.  300 Faces In-The-Wild Challenge , 2016 .

[115]  A. Furnham,et al.  A cross-cultural investigation of trait emotional intelligence in Hong Kong and the UK , 2014 .

[116]  John H. L. Hansen,et al.  Getting started with SUSAS: a speech under simulated and actual stress database , 1997, EUROSPEECH.

[117]  Björn W. Schuller,et al.  Audiovisual Behavior Modeling by Combined Feature Spaces , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[118]  Victor S. Lempitsky,et al.  Unsupervised Domain Adaptation by Backpropagation , 2014, ICML.

[119]  Tong Zhang,et al.  Multi-cue fusion for emotion recognition in the wild , 2018, Neurocomputing.

[120]  M. Pantic,et al.  Induced Disgust , Happiness and Surprise : an Addition to the MMI Facial Expression Database , 2010 .

[121]  Jiahui Pan,et al.  Combining Facial Expressions and Electroencephalography to Enhance Emotion Recognition , 2019, Future Internet.

[122]  Ngoc Thang Vu,et al.  Improving Speech Emotion Recognition with Unsupervised Representation Learning on Unlabeled Speech , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[123]  Yoshua Bengio,et al.  Challenges in representation learning: A report on three machine learning contests , 2013, Neural Networks.

[124]  Lorenzo Torresani,et al.  Exploiting weakly-labeled Web images to improve object classification: a domain adaptation approach , 2010, NIPS.

[125]  Muhittin Gokmen,et al.  Facial expression recognition from static images , 2014, 2014 22nd Signal Processing and Communications Applications Conference (SIU).

[126]  Qi Dong,et al.  Emotion experience and regulation in China and the United States: how do culture and gender shape emotion responding? , 2012, International journal of psychology : Journal international de psychologie.

[127]  K. Scherer,et al.  Introducing the Geneva Multimodal Emotion Portrayal (GEMEP) corpus , 2010 .

[128]  Sergio Escalera,et al.  ChaLearn Joint Contest on Multimedia Challenges Beyond Visual Analysis: An overview , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[129]  Russell Beale,et al.  The Role of Affect and Emotion in HCI , 2008, Affect and Emotion in Human-Computer Interaction.

[130]  Emily Mower Provost,et al.  Cross-corpus acoustic emotion recognition from singing and speaking: A multi-task learning approach , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[131]  Emily Mower Provost,et al.  The PRIORI Emotion Dataset: Linking Mood to Emotion Detected In-the-Wild , 2018, INTERSPEECH.

[132]  Mei-Yuh Hwang,et al.  Domain Adversarial Training for Accented Speech Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[133]  Trevor Darrell,et al.  Adapting Visual Category Models to New Domains , 2010, ECCV.

[134]  Frédéric Jurie,et al.  Temporal multimodal fusion for video emotion classification in the wild , 2017, ICMI.

[135]  L. Leyman,et al.  The Karolinska Directed Emotional Faces: A validation study , 2008 .

[136]  Bernhard Schölkopf,et al.  Domain Generalization via Invariant Feature Representation , 2013, ICML.

[137]  Jing Li,et al.  A Novel Speech Emotion Recognition Method via Transfer PCA and Sparse Coding , 2015, CCBR.

[138]  Michael J. Lyons,et al.  Automatic Classification of Single Facial Images , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[139]  Emily Mower Provost,et al.  Predicting Emotion Perception Across Domains: A Study of Singing and Speaking , 2015, AAAI.

[140]  Trevor Darrell,et al.  Simultaneous Deep Transfer Across Domains and Tasks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[141]  Rongrong Ji,et al.  Large-scale visual sentiment ontology and detectors using adjective noun pairs , 2013, ACM Multimedia.

[142]  Jean Carletta,et al.  The AMI Meeting Corpus: A Pre-announcement , 2005, MLMI.

[143]  Takeo Kanade,et al.  The Extended Cohn-Kanade Dataset (CK+): A complete dataset for action unit and emotion-specified expression , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.