Design of an explainable machine learning challenge for video interviews

This paper reviews and discusses research advances on “explainable machine learning” in computer vision. We focus on a particular area of the “Looking at People” (LAP) thematic domain: first impressions and personality analysis. Our aim is to make the computational intelligence and computer vision communities aware of the importance of developing explanatory mechanisms for computer-assisted decision making applications, such as automating recruitment. Judgments based on personality traits are being made routinely by human resource departments to evaluate the candidates' capacity of social insertion and their potential of career growth. However, inferring personality traits and, in general, the process by which we humans form a first impression of people, is highly subjective and may be biased. Previous studies have demonstrated that learning machines can learn to mimic human decisions. In this paper, we go one step further and formulate the problem of explaining the decisions of the models as a means of identifying what visual aspects are important, understanding how they relate to decisions suggested, and possibly gaining insight into undesirable negative biases. We design a new challenge on explainability of learning machines for first impressions analysis. We describe the setting, scenario, evaluation metrics and preliminary outcomes of the competition. To the best of our knowledge this is the first effort in terms of challenges for explainability in computer vision. In addition our challenge design comprises several other quantitative and qualitative elements of novelty, including a “coopetition” setting, which combines competition and collaboration.

[1]  Fabio Celli,et al.  Automatic Personality and Interaction Style Recognition from Facebook Profile Pictures , 2014, ACM Multimedia.

[2]  Benno Stein,et al.  Overview of the 3rd Author Profiling Task at PAN 2015 , 2015, CLEF.

[3]  Sergio Escalera,et al.  Overcoming Calibration Problems in Pattern Labeling with Pairwise Ratings: Application to Personality Traits , 2016, ECCV Workshops.

[4]  Sergio Escalera,et al.  ChaLearn Looking at People 2015: Apparent Age and Cultural Event Recognition Datasets and Results , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[5]  Sergio Escalera,et al.  ChaLearn Looking at People Challenge 2014: Dataset and Results , 2014, ECCV Workshops.

[6]  Xiu-Shen Wei,et al.  Deep Bimodal Regression for Apparent Personality Analysis , 2016, ECCV Workshops.

[7]  E. Vincent Cross,et al.  Explaining robot actions , 2012, 2012 7th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[8]  Yi Yang,et al.  DevNet: A Deep Event Network for multimedia event detection and evidence recounting , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Frank J. Bernieri,et al.  Toward a histology of social behavior: Judgmental accuracy from thin slices of the behavioral stream , 2000 .

[10]  Janine Willis,et al.  First Impressions , 2006, Psychological science.

[11]  Albert Ali Salah,et al.  Multimodal fusion of audio, scene, and face features for first impression estimation , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[12]  Sergio Escalera,et al.  ChaLearn Joint Contest on Multimedia Challenges Beyond Visual Analysis: An overview , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[13]  Alessandro Vinciarelli,et al.  A Survey of Personality Computing , 2014, IEEE Transactions on Affective Computing.

[14]  Trevor Darrell,et al.  Generating Visual Explanations , 2016, ECCV.

[15]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[16]  Peter Mende-Siedlecki,et al.  Social attributions from faces: determinants, consequences, accuracy, and functional significance. , 2015, Annual review of psychology.

[17]  Hui Cheng,et al.  Multimedia event recounting with concept based representation , 2012, ACM Multimedia.

[18]  Björn Schuller,et al.  Opensmile: the munich versatile and fast open-source audio feature extractor , 2010, ACM Multimedia.

[19]  Trevor Darrell,et al.  Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding , 2016, EMNLP.

[20]  Peter N. Belhumeur,et al.  How Do You Tell a Blackbird from a Crow? , 2013, 2013 IEEE International Conference on Computer Vision.

[21]  Hugo Jair Escalante,et al.  INAOE's Participation at PAN'15: Author Profiling task , 2015, CLEF.

[22]  Daniel Gatica-Perez,et al.  Happy and agreeable?: multi-label classification of impressions in social video , 2015, MUM.

[23]  Sergio Escalera,et al.  ChaLearn Looking at People and Faces of the World: Face AnalysisWorkshop and Challenge 2016 , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[24]  Fabio Pianesi,et al.  Workshop on Computational Personality Recognition: Shared Task , 2013, Proceedings of the International AAAI Conference on Web and Social Media.

[25]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[26]  N. Ambady,et al.  Thin slices of expressive behavior as predictors of interpersonal consequences: A meta-analysis. , 1992 .

[27]  K. McKeown,et al.  Justification Narratives for Individual Classifications , 2014 .

[28]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[29]  H. Chad Lane,et al.  Explainable Artificial Intelligence for Training and Tutoring , 2005, AIED.

[30]  Sergio Escalera,et al.  ChaLearn LAP 2016: First Round Challenge on First Impressions - Dataset and Results , 2016, ECCV Workshops.

[31]  Sergio Escalera,et al.  ChaLearn looking at people: A review of events and resources , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[32]  Sergio Escalera,et al.  ChaLearn Looking at People 2015 challenges: Action spotting and cultural event recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[33]  Anurag Mittal,et al.  Bi-modal First Impressions Recognition Using Temporally Ordered Deep Audio and Stochastic Visual Features , 2016, ECCV Workshops.

[34]  Luis Enrique Sucar,et al.  Integrating Probabilistic and Knowledge-Based Systems for Explanation Generation , 2008, ExaCt.

[35]  Sergio Escalera,et al.  ChaLearn Looking at People: Events and Resources , 2017, ArXiv.

[36]  Marcel van Gerven,et al.  Deep Impression: Audiovisual Deep Residual Networks for Multimodal Apparent Personality Trait Recognition , 2016, ECCV Workshops.

[37]  Fabio Valente,et al.  The INTERSPEECH 2013 computational paralinguistics challenge: social signals, conflict, emotion, autism , 2013, INTERSPEECH.

[38]  Vagia Tsiminaki,et al.  Hi YouTube!: personality impressions and verbal content in social video , 2013, ICMI '13.

[39]  E. Shortliffe,et al.  An analysis of physician attitudes regarding computer-based clinical consultation systems. , 1981, Computers and biomedical research, an international journal.

[40]  Ran R. Hassin,et al.  Facing faces: studies on the cognitive aspects of physiognomy. , 2000, Journal of personality and social psychology.

[41]  D. Berry Taking People at Face Value: Evidence for the Kernel of Truth Hypothesis , 1990 .

[42]  Benno Stein,et al.  Overview of the Author Profiling Task at PAN 2013 , 2013, CLEF.

[43]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[44]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Firoj Alam,et al.  Personality Traits Recognition on Social Network - Facebook , 2013, Proceedings of the International AAAI Conference on Web and Social Media.