Multimodal First Impression Analysis with Deep Residual Networks

People form first impressions about the personalities of unfamiliar individuals even after very brief interactions with them. In this study we present and evaluate several models that mimic this automatic social behavior. Specifically, we present several models trained on a large dataset of short YouTube video blog posts for predicting apparent Big Five personality traits of people and whether they seem suitable to be recommended to a job interview. Along with presenting our audiovisual approach and results that won the third place in the ChaLearn First Impressions Challenge, we investigate modeling in different modalities including audio only, visual only, language only, audiovisual, and combination of audiovisual and language. Our results demonstrate that the best performance could be obtained using a fusion of all data modalities. Finally, in order to promote explainability in machine learning and to provide an example for the upcoming ChaLearn challenges, we present a simple approach for explaining the predictions for job interview recommendations.

[1]  K. Scherer Personality inference from voice quality: The loud voice of extroversion. , 1978 .

[2]  J. Welkowitz,et al.  Vocal frequency and person perception: Effects of perceptual salience and nonverbal sensitivity , 1987 .

[3]  Murray R. Barrick,et al.  THE BIG FIVE PERSONALITY DIMENSIONS AND JOB PERFORMANCE: A META-ANALYSIS , 1991 .

[4]  P. Borkenau,et al.  Trait inferences: Sources of validity at zero acquaintance. , 1992 .

[5]  B. Knutson Facial expressions of emotion influence interpersonal trait inferences , 1996 .

[6]  R. Winter What's in a face? , 1996, Nature Genetics.

[7]  Leslie A. Zebrowitz,et al.  Person Perception Comes of Age: The Salience and Significance of Age in Social Judgments , 1998 .

[8]  Janine Willis,et al.  First Impressions , 2006, Psychological science.

[9]  Marilyn A. Walker,et al.  Using Linguistic Cues for the Automatic Recognition of Personality in Conversation and Text , 2007, J. Artif. Intell. Res..

[10]  S. Gosling,et al.  Personality and Social Psychology Bulletin Personality Judgments Based on Physical Appearance Personality Judgments Based on Physical Appearance , 2022 .

[11]  Christopher Y. Olivola,et al.  Fooled by first impressions? Reexamining the diagnostic value of appearance-based inferences , 2010 .

[12]  G. L. Lorenzo,et al.  What Is Beautiful Is Good and More Accurately Understood , 2010, Psychological science.

[13]  Tim Polzehl,et al.  Automatically Assessing Personality from Speech , 2010, 2010 IEEE Fourth International Conference on Semantic Computing.

[14]  R. Kotov,et al.  Personality and depression: explanatory models and review of the evidence. , 2011, Annual review of clinical psychology.

[15]  Daniel Gatica-Perez,et al.  You Are Known by How You Vlog: Personality Impressions and Nonverbal Behavior in YouTube , 2011, ICWSM.

[16]  Alexei V. Ivanov,et al.  Modulation Spectrum Analysis for Speaker Personality Trait Recognition , 2012, INTERSPEECH.

[17]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[18]  Alessandro Vinciarelli,et al.  Automatic personality perception: Prediction of trait attribution based on prosodic features extended abstract , 2015, 2015 International Conference on Affective Computing and Intelligent Interaction (ACII).

[19]  Elmar Nöth,et al.  The INTERSPEECH 2012 Speaker Trait Challenge , 2012, INTERSPEECH.

[20]  Alessandro Perina,et al.  Unveiling the multimedia unconscious: implicit cognitive processes and multimedia content analysis , 2013, ACM Multimedia.

[21]  Bilge Mutlu,et al.  MACH: my automated conversation coach , 2013, UbiComp.

[22]  Catherine Pelachaud,et al.  The TARDIS Framework: Intelligent Virtual Agents for Social Coaching in Job Interviews , 2013, Advances in Computer Entertainment.

[23]  Daniel Gatica-Perez,et al.  The YouTube Lens: Crowdsourced Personality Impressions and Audiovisual Analysis of Vlogs , 2013, IEEE Transactions on Multimedia.

[24]  A. Young,et al.  Modeling first impressions from highly variable facial images , 2014, Proceedings of the National Academy of Sciences.

[25]  A. Todorov,et al.  How Do You Say ‘Hello’? Personality Impressions from Brief Novel Voices , 2014, PloS one.

[26]  A. Todorov,et al.  Misleading First Impressions , 2014, Psychological science.

[27]  Alessandro Vinciarelli,et al.  A Survey of Personality Computing , 2014, IEEE Transactions on Affective Computing.

[28]  Daniel Gatica-Perez,et al.  What Your Face Vlogs About: Expressions of Emotion and Big-Five Traits Impressions in YouTube , 2015, IEEE Transactions on Affective Computing.

[29]  Sanja Fidler,et al.  Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[30]  Andrew Zisserman,et al.  Deep Face Recognition , 2015, BMVC.

[31]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[32]  D. Perrett,et al.  Influence of Perceived Height, Masculinity, and Age on Each Other and on Perceptions of Dominance in Male Faces , 2015, Perception.

[33]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[34]  Elmar Nöth,et al.  A Survey on perceived speaker traits: Personality, likability, pathology, and the first challenge , 2015, Comput. Speech Lang..

[35]  Kenta Oono,et al.  Chainer : a Next-Generation Open Source Framework for Deep Learning , 2015 .

[36]  Sanja Fidler,et al.  Skip-Thought Vectors , 2015, NIPS.

[37]  Daniel Gildea,et al.  Automated prediction and analysis of job interview performance: The role of what you say and how you say it , 2015, 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[38]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[39]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Sergio Escalera,et al.  ChaLearn LAP 2016: First Round Challenge on First Impressions - Dataset and Results , 2016, ECCV Workshops.

[41]  Sergio Escalera,et al.  ChaLearn Joint Contest on Multimedia Challenges Beyond Visual Analysis: An overview , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[42]  Jian Sun,et al.  Instance-Aware Semantic Segmentation via Multi-task Network Cascades , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Tomoki Toda,et al.  Teaching Social Communication Skills Through Human-Agent Interaction , 2016, TIIS.

[44]  Marcel A. J. van Gerven,et al.  Brains on Beats , 2016, NIPS.

[45]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Mohammed E. Hoque,et al.  AutoManner: An Automated Interface for Making Public Speakers Aware of Their Mannerisms , 2016, IUI.

[47]  Anurag Mittal,et al.  Bi-modal First Impressions Recognition Using Temporally Ordered Deep Audio and Stochastic Visual Features , 2016, ECCV Workshops.

[48]  Xiu-Shen Wei,et al.  Deep Bimodal Regression for Apparent Personality Analysis , 2016, ECCV Workshops.

[49]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[50]  Marcel van Gerven,et al.  Deep Impression: Audiovisual Deep Residual Networks for Multimodal Apparent Personality Trait Recognition , 2016, ECCV Workshops.

[51]  Lale Akarun,et al.  Automatic personality prediction from audiovisual data using random forest regression , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[52]  Jean-Marc Odobez,et al.  Training on the job: behavioral analysis of job interviews in hospitality , 2016, ICMI.

[53]  Li Fei-Fei,et al.  Perceptual Losses for Real-Time Style Transfer and Super-Resolution , 2016, ECCV.

[54]  Albert Ali Salah,et al.  Multimodal fusion of audio, scene, and face features for first impression estimation , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[55]  Sergio Escalera,et al.  Overcoming Calibration Problems in Pattern Labeling with Pairwise Ratings: Application to Personality Traits , 2016, ECCV Workshops.

[56]  Hatice Gunes,et al.  Automatic Prediction of Impressions in Time and across Varying Context: Personality, Attractiveness and Likeability , 2017, IEEE Transactions on Affective Computing.

[57]  Stéphane Ayache,et al.  Design of an explainable machine learning challenge for video interviews , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[58]  Alessandro Perina,et al.  The Pictures We Like Are Our Image: Continuous Mapping of Favorite Pictures into Self-Assessed and Attributed Personality Traits , 2017, IEEE Transactions on Affective Computing.

[59]  Daniel Gildea,et al.  Automated Analysis and Prediction of Job Interview Performance , 2015, IEEE Transactions on Affective Computing.