Multimodal Video-based Apparent Personality Recognition Using Long Short-Term Memory and Convolutional Neural Networks

Personality computing and affective computing, where the recognition of personality traits is essential, have gained increasing interest and attention in many research areas recently. We propose a novel approach to recognize the Big Five personality traits of people from videos. Personality and emotion affect the speaking style, facial expressions, body movements, and linguistic factors in social contexts, and they are affected by environmental elements. We develop a multimodal system to recognize apparent personality based on various modalities such as the face, environment, audio, and transcription features. We use modality-specific neural networks that learn to recognize the traits independently and we obtain a final prediction of apparent personality with a feature-level fusion of these networks. We employ pre-trained deep convolutional neural networks such as ResNet and VGGish networks to extract high-level features and Long Short-Term Memory networks to integrate temporal information. We train the large model consisting of modalityspecific subnetworks using a two-stage training process. We first train the subnetworks separately and then fine-tune the overall model using these trained networks. We evaluate the proposed method using ChaLearn First Impressions V2 challenge dataset. Our approach obtains the best overall “mean accuracy” score, averaged over five personality traits, compared to the state-ofthe-art.

[1]  Thorsten Brants,et al.  One billion word benchmark for measuring progress in statistical language modeling , 2013, INTERSPEECH.

[2]  Nan Hua,et al.  Universal Sentence Encoder , 2018, ArXiv.

[3]  Marco Degemmis,et al.  Introduction to Emotions and Personality in Personalized Systems , 2017, Emotions and Personality in Personalized Services.

[4]  S. Gosling,et al.  Personality in its natural habitat: manifestations and implicit folk theories of personality in daily life. , 2006, Journal of personality and social psychology.

[5]  Albert Ali Salah,et al.  Combining Deep Facial and Ambient Features for First Impression Estimation , 2016, ECCV Workshops.

[6]  Nick Campbell,et al.  Automatic recognition of attitudes in video blogs - prosodic and visual feature analysis , 2014, INTERSPEECH.

[7]  Sergio Escalera,et al.  Overcoming Calibration Problems in Pattern Labeling with Pairwise Ratings: Application to Personality Traits , 2016, ECCV Workshops.

[8]  Aren Jansen,et al.  CNN architectures for large-scale audio classification , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  Stefan Winkler,et al.  ASCERTAIN: Emotion and Personality Recognition Using Commercial Sensors , 2018, IEEE Transactions on Affective Computing.

[10]  Qi Tian,et al.  HMM-Based Audio Keyword Generation , 2004, PCM.

[11]  Sergio Escalera,et al.  ChaLearn LAP 2016: First Round Challenge on First Impressions - Dataset and Results , 2016, ECCV Workshops.

[12]  Junjie Lin,et al.  Personality-based refinement for sentiment classification in microblog , 2017, Knowl. Based Syst..

[13]  P. Costa,et al.  Validation of the five-factor model of personality across instruments and observers. , 1987, Journal of personality and social psychology.

[14]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[15]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[16]  T. G. I. Fernando,et al.  Persons’ Personality Traits Recognition using Machine Learning Algorithms and Image Processing Techniques , 2016 .

[17]  Guillermo Jiménez-Díaz,et al.  Personality aware recommendations to groups , 2009, RecSys '09.

[18]  Apostol Natsev,et al.  YouTube-8M: A Large-Scale Video Classification Benchmark , 2016, ArXiv.

[19]  Chandrima Sarkar,et al.  Feature Analysis for Computational Personality Recognition Using YouTube Personality Data set , 2014, WCPR '14.

[20]  R. A. Bradley,et al.  Rank Analysis of Incomplete Block Designs: I. The Method of Paired Comparisons , 1952 .

[21]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[22]  Marcel van Gerven,et al.  Deep Impression: Audiovisual Deep Residual Networks for Multimodal Apparent Personality Trait Recognition , 2016, ECCV Workshops.

[23]  Mohamed Jemni,et al.  Role of personality in computer based learning , 2016, Comput. Hum. Behav..

[24]  R. McCrae,et al.  An introduction to the five-factor model and its applications. , 1992, Journal of personality.

[25]  G. Stemmler,et al.  Personality, emotion, and individual differences in physiological responses , 2010, Biological Psychology.

[26]  Xiu-Shen Wei,et al.  Deep Bimodal Regression for Apparent Personality Analysis , 2016, ECCV Workshops.

[27]  D. Perrett,et al.  Using composite images to assess accuracy in personality attribution to faces. , 2007, British journal of psychology.

[28]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[29]  Norman I. Badler,et al.  Psychological Parameters for Crowd Simulation: From Audiences to Mobs , 2016, IEEE Transactions on Visualization and Computer Graphics.

[30]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[31]  Scott Nowson,et al.  Look! Who's Talking?: Projection of Extraversion Across Different Social Contexts , 2014, WCPR '14.

[32]  Robert M. Haralick,et al.  Feature normalization and likelihood-based similarity measures for image retrieval , 2001, Pattern Recognit. Lett..

[33]  Yu Qiao,et al.  Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks , 2016, IEEE Signal Processing Letters.

[34]  Nadia Mana,et al.  Multimodal recognition of personality traits in social interactions , 2008, ICMI '08.

[35]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[36]  K. Grammer,et al.  Facial symmetry and the big-five personality factors , 2005 .

[37]  J. V. Kasmar,et al.  Effect of environmental surroundings on outpatients' mood and perception of psychiatrists. , 1968, Journal of consulting and clinical psychology.

[38]  Minjuan Wang,et al.  Affective e-Learning: Using "Emotional" Data to Improve Learning in Pervasive Learning Environment , 2009, J. Educ. Technol. Soc..

[39]  Sepp Hochreiter,et al.  Untersuchungen zu dynamischen neuronalen Netzen , 1991 .

[40]  M. Engin Deniz,et al.  An Investigation of Decision Making Styles and the Five-Factor Personality Traits with Respect to Attachment Styles. , 2011 .

[41]  H. Pfister,et al.  The multiplicity of emotions: A framework of emotional functions in decision making , 2008, Judgment and Decision Making.

[42]  Ugur Güdükbay,et al.  Using real life incidents for creating realistic virtual crowds with data-driven emotion contagion , 2018, Comput. Graph..

[43]  D. Kahneman Thinking, Fast and Slow , 2011 .

[44]  R. A. Bradley,et al.  RANK ANALYSIS OF INCOMPLETE BLOCK DESIGNS THE METHOD OF PAIRED COMPARISONS , 1952 .

[45]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[46]  Albert Ali Salah,et al.  Multimodal fusion of audio, scene, and face features for first impression estimation , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[47]  Xiu-Shen Wei,et al.  Deep Bimodal Regression of Apparent Personality Traits from Short Video Sequences , 2018, IEEE Transactions on Affective Computing.

[48]  Joemon M. Jose,et al.  Integrating facial expressions into user profiling for the improvement of a multimodal recommender system , 2009, 2009 IEEE International Conference on Multimedia and Expo.

[49]  L. A. Pervin Personality: Theory and Research , 1984 .

[50]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[51]  Norman I. Badler,et al.  How the Ocean Personality Model Affects the Perception of Crowds , 2011, IEEE Computer Graphics and Applications.

[52]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[53]  R. E. Christal,et al.  Recurrent personality factors based on trait ratings. , 1992, Journal of personality.

[54]  A. Furnham,et al.  Personality and music: can traits explain how people use music in everyday life? , 2007, British journal of psychology.

[55]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[56]  Davis E. King,et al.  Dlib-ml: A Machine Learning Toolkit , 2009, J. Mach. Learn. Res..

[57]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[58]  Zhihong Zeng,et al.  A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions , 2009, IEEE Trans. Pattern Anal. Mach. Intell..

[59]  David Masip,et al.  Interpreting CNN Models for Apparent Personality Trait Regression , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[60]  Fabio Valente,et al.  Annotation and Recognition of Personality Traits in Spoken Conversations from the AMI Meetings Corpus , 2012, INTERSPEECH.

[61]  L. R. Goldberg The structure of phenotypic personality traits. , 1993, The American psychologist.

[62]  S. Gosling,et al.  PERSONALITY PROCESSES AND INDIVIDUAL DIFFERENCES The Do Re Mi’s of Everyday Life: The Structure and Personality Correlates of Music Preferences , 2003 .

[63]  Zhanyi Hu,et al.  Modern physiognomy: an investigation on predicting personality traits and intelligence from the human face , 2016, Science China Information Sciences.

[64]  Nicu Sebe,et al.  Multimodal Personality Recognition in Collaborative Goal-Oriented Tasks , 2016, IEEE Transactions on Multimedia.

[65]  Anurag Mittal,et al.  Bi-modal First Impressions Recognition Using Temporally Ordered Deep Audio and Stochastic Visual Features , 2016, ECCV Workshops.

[66]  C. Nass,et al.  Does computer-synthesized speech manifest personality? Experimental tests of recognition, similarity-attraction, and consistency-attraction. , 2001, Journal of experimental psychology. Applied.

[67]  J. Breese,et al.  Emotion and personality in a conversational agent , 2001 .

[68]  Firoj Alam,et al.  Personality Traits Recognition on Social Network - Facebook , 2013, Proceedings of the International AAAI Conference on Web and Social Media.

[69]  Alejandro Bellogín,et al.  Relating Personality Types with User Preferences in Multiple Entertainment Domains , 2013, UMAP Workshops.

[70]  Jürgen Schmidhuber,et al.  Learning to forget: continual prediction with LSTM , 1999 .

[71]  Dinesh Manocha,et al.  Aggressive, Tense or Shy? Identifying Personality Traits from Crowd Videos , 2017, IJCAI.

[72]  A. Young,et al.  Modeling first impressions from highly variable facial images , 2014, Proceedings of the National Academy of Sciences.

[73]  Fadi Dornaika,et al.  Personality Traits and Job Candidate Screening via Analyzing Facial Videos , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[74]  Rong Hu,et al.  A Study on User Perception of Personality-Based Recommender Systems , 2010, UMAP.

[75]  Alessandro Vinciarelli,et al.  A Survey of Personality Computing , 2014, IEEE Transactions on Affective Computing.

[76]  Nicu Sebe,et al.  Please, tell me about yourself: automatic personality assessment using short self-presentations , 2011, ICMI '11.

[77]  Marie-Francine Moens,et al.  Computational personality recognition in social media , 2016, User Modeling and User-Adapted Interaction.