Explaining First Impressions: Modeling, Recognizing, and Explaining Apparent Personality from Videos

Explainability and interpretability are two critical aspects of decision support systems. Within computer vision, they are critical in certain tasks related to human behavior analysis such as in health care applications. Despite their importance, it is only recently that researchers are starting to explore these aspects. This paper provides an introduction to explainability and interpretability in the context of computer vision with an emphasis on looking at people tasks. Specifically, we review and study those mechanisms in the context of first impressions analysis. To the best of our knowledge, this is the first effort in this direction. Additionally, we describe a challenge we organized on explainability in first impressions analysis from video. We analyze in detail the newly introduced data set, the evaluation protocol, and summarize the results of the challenge. Finally, derived from our study, we outline research opportunities that we foresee will be decisive in the near future for the development of the explainable computer vision field.

[1]  Stéphane Ayache,et al.  Design of an explainable machine learning challenge for video interviews , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[2]  Albert Ali Salah,et al.  Combining Deep Facial and Ambient Features for First Impression Estimation , 2016, ECCV Workshops.

[3]  L. Breiman Heuristics of instability and stabilization in model selection , 1996 .

[4]  Guang-Bin Huang,et al.  Extreme learning machine: a new learning scheme of feedforward neural networks , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[5]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[6]  R. McCrae,et al.  An introduction to the five-factor model and its applications. , 1992, Journal of personality.

[7]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[8]  Albert Ali Salah,et al.  Multi-modal Score Fusion and Decision Trees for Explainable Automatic Job Candidate Screening from Video CVs , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[9]  Fabio Pianesi,et al.  Stressful first impressions in job interviews , 2016, ICMI.

[10]  Klaus-Robert Müller,et al.  Interpreting and Explaining Deep Neural Networks for Classification of Audio Signals , 2018, ArXiv.

[11]  Sergio Escalera,et al.  First Impressions: A Survey on Computer Vision-Based Apparent Personality Trait Analysis , 2018, ArXiv.

[12]  Fernando De la Torre,et al.  Supervised Descent Method and Its Applications to Face Alignment , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  James W. Davis,et al.  The Recognition of Human Movement Using Temporal Templates , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Sanja Fidler,et al.  Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[15]  Albert Ali Salah,et al.  Video-based emotion recognition in the wild using deep transfer learning and score fusion , 2017, Image Vis. Comput..

[16]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  David Masip,et al.  Interpreting CNN Models for Apparent Personality Trait Regression , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[18]  Michael J. Black,et al.  Recognizing Facial Expressions in Image Sequences Using Local Parameterized Models of Image Motion , 1997, International Journal of Computer Vision.

[19]  Peter Robinson,et al.  Cross-dataset learning and person-specific normalisation for automatic Action Unit detection , 2015, 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[20]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[21]  Daniel Gildea,et al.  Automated Analysis and Prediction of Job Interview Performance , 2015, IEEE Transactions on Affective Computing.

[22]  Dhruv Batra,et al.  Human Attention in Visual Question Answering: Do Humans and Deep Networks look at the same regions? , 2016, EMNLP.

[23]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[24]  Sergio Escalera,et al.  ChaLearn LAP 2016: First Round Challenge on First Impressions - Dataset and Results , 2016, ECCV Workshops.

[25]  R. P. Fishburne,et al.  Derivation of New Readability Formulas (Automated Readability Index, Fog Count and Flesch Reading Ease Formula) for Navy Enlisted Personnel , 1975 .

[26]  F. Schmidt,et al.  The validity and utility of selection methods in personnel psychology: Practical and theoretical implications of 85 years of research findings. , 1998 .

[27]  Daniel Gatica-Perez,et al.  Hirability in the Wild: Analysis of Online Conversational Video Resumes , 2016, IEEE Transactions on Multimedia.

[28]  Sergio Escalera,et al.  Explainable and Interpretable Models in Computer Vision and Machine Learning , 2018, The Springer Series on Challenges in Machine Learning.

[29]  Albert Ali Salah,et al.  Continuous Mapping of Personality Traits: A Novel Challenge and Failure Conditions , 2014, MAPTRAITS '14.

[30]  Christian Wolf,et al.  ModDrop: Adaptive Multi-Modal Gesture Recognition , 2014, IEEE Trans. Pattern Anal. Mach. Intell..

[31]  Daniel Gatica-Perez,et al.  You Are Known by How You Vlog: Personality Impressions and Nonverbal Behavior in YouTube , 2011, ICWSM.

[32]  Klaus-Robert Müller,et al.  Investigating the influence of noise and distractors on the interpretation of neural networks , 2016, ArXiv.

[33]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[34]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[35]  Albert Ali Salah,et al.  Multimodal fusion of audio, scene, and face features for first impression estimation , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[36]  Matti Pietikäinen,et al.  Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[37]  Björn Schuller,et al.  Opensmile: the munich versatile and fast open-source audio feature extractor , 2010, ACM Multimedia.

[38]  Max Welling,et al.  Visualizing Deep Neural Network Decisions: Prediction Difference Analysis , 2017, ICLR.

[39]  Bob L. Sturm A Simple Method to Determine if a Music Information Retrieval System is a “Horse” , 2014, IEEE Transactions on Multimedia.

[40]  Andrea Vedaldi,et al.  Visualizing Deep Convolutional Neural Networks Using Natural Pre-images , 2015, International Journal of Computer Vision.

[41]  Alessandro Vinciarelli,et al.  A Survey of Personality Computing , 2014, IEEE Transactions on Affective Computing.

[42]  Trevor Darrell,et al.  Generating Visual Explanations , 2016, ECCV.

[43]  Sergio Escalera,et al.  Overcoming Calibration Problems in Pattern Labeling with Pairwise Ratings: Application to Personality Traits , 2016, ECCV Workshops.

[44]  Sergio Escalera,et al.  Visualizing Apparent Personality Analysis with Deep Residual Networks , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[45]  Adrian Hilton,et al.  Visual Analysis of Humans - Looking at People , 2013 .

[46]  Sergio Escalera,et al.  On the Effect of Observed Subject Biases in Apparent Personality Analysis from Audio-visual Signals , 2019, ArXiv.

[47]  Sergio Escalera,et al.  ChaLearn looking at people: A review of events and resources , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[48]  P. Ekman,et al.  What the face reveals : basic and applied studies of spontaneous expression using the facial action coding system (FACS) , 2005 .

[49]  Sergio Escalera,et al.  Multimodal First Impression Analysis with Deep Residual Networks , 2018, IEEE Transactions on Affective Computing.

[50]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[51]  Michel F. Valstar,et al.  Local Gabor Binary Patterns from Three Orthogonal Planes for Automatic Facial Expression Recognition , 2013, 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction.

[52]  Andrew Zisserman,et al.  Deep Face Recognition , 2015, BMVC.

[53]  Fadi Dornaika,et al.  Personality Traits and Job Candidate Screening via Analyzing Facial Videos , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[54]  Albert Ali Salah,et al.  Are You Really Smiling at Me? Spontaneous versus Posed Enjoyment Smiles , 2012, ECCV.

[55]  Alessandro Verri,et al.  Learning to Recognize Visual Dynamic Events from Examples , 2000, International Journal of Computer Vision.

[56]  Hod Lipson,et al.  Understanding Neural Networks Through Deep Visualization , 2015, ArXiv.

[57]  Kush R. Varshney,et al.  Proceedings of the 2016 ICML Workshop on Human Interpretability in Machine Learning (WHI 2016) , 2016, ArXiv.

[58]  Fabio Valente,et al.  The INTERSPEECH 2013 computational paralinguistics challenge: social signals, conflict, emotion, autism , 2013, INTERSPEECH.

[59]  G. Harry McLaughlin,et al.  SMOG Grading - A New Readability Formula. , 1969 .

[60]  Kush R. Varshney,et al.  Proceedings of the 2017 ICML Workshop on Human Interpretability in Machine Learning (WHI 2017) , 2017, ArXiv.

[61]  P. Borkenau,et al.  Extraversion is accurately perceived after a 50-ms exposure to a face. , 2009 .

[62]  M. Turk,et al.  Eigenfaces for Recognition , 1991, Journal of Cognitive Neuroscience.

[63]  Sergio Escalera,et al.  ChaLearn Looking at People and Faces of the World: Face AnalysisWorkshop and Challenge 2016 , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[64]  Yoshua Bengio,et al.  Challenges in representation learning: A report on three machine learning contests , 2013, Neural Networks.

[65]  Jian Cheng,et al.  Visualizing deep neural network by alternately image blurring and deblurring , 2018, Neural Networks.

[66]  Sergio Escalera,et al.  First Impressions: A Survey on Vision-Based Apparent Personality Trait Analysis , 2018, IEEE Transactions on Affective Computing.

[67]  Marcel A. J. van Gerven,et al.  Brains on Beats , 2016, NIPS.

[68]  E A Smith,et al.  Automated readability index. , 1967, AMRL-TR. Aerospace Medical Research Laboratories.

[69]  Percy Liang,et al.  Understanding Black-box Predictions via Influence Functions , 2017, ICML.

[70]  S. Gosling,et al.  Personality and Social Psychology Bulletin Personality Judgments Based on Physical Appearance Personality Judgments Based on Physical Appearance , 2022 .

[71]  Klaus-Robert Müller,et al.  "What is relevant in a text document?": An interpretable machine learning approach , 2016, PloS one.

[72]  Björn W. Schuller,et al.  AVEC 2013: the continuous audio/visual emotion and depression recognition challenge , 2013, AVEC@ACM Multimedia.

[73]  Ming Yang,et al.  DeepFace: Closing the Gap to Human-Level Performance in Face Verification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[74]  Cynthia C. S. Liem,et al.  Human-Explainable Features for Job Candidate Screening Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[75]  Sergio Escalera,et al.  ChaLearn Joint Contest on Multimedia Challenges Beyond Visual Analysis: An overview , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[76]  Sanja Fidler,et al.  Skip-Thought Vectors , 2015, NIPS.

[77]  Abhishek Das,et al.  Grad-CAM: Why did you say that? , 2016, ArXiv.

[78]  John F. Canny,et al.  Interpretable Learning for Self-Driving Cars by Visualizing Causal Attention , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[79]  Andrew Gordon Wilson,et al.  Proceedings of NIPS 2016 Workshop on Interpretable Machine Learning for Complex Systems , 2016, 1611.09139.

[80]  Mahadev Satyanarayanan,et al.  OpenFace: A general-purpose face recognition library with mobile applications , 2016 .

[81]  Jonathan Anderson Lix and Rix: Variations on a Little-Known Readability Index. , 1983 .

[82]  R. Gunning The Technique of Clear Writing. , 1968 .

[83]  Or Biran,et al.  Explanation and Justification in Machine Learning : A Survey Or , 2017 .

[84]  Randal C. Nelson,et al.  Detection and Recognition of Periodic, Nonrigid Motion , 1997, International Journal of Computer Vision.

[85]  M. Coleman,et al.  A computer readability formula designed for machine scoring. , 1975 .

[86]  R. Flesch A new readability yardstick. , 1948, The Journal of applied psychology.

[87]  M. Cook,et al.  Personnel Selection , 2004 .