Casual Conversations: A dataset for measuring fairness in AI

This paper introduces a novel "fairness" dataset to measure the robustness of AI models to a diverse set of age, genders, apparent skin tones and ambient lighting conditions. Our dataset is composed of 3,011 subjects and contains over 45,000 videos, with an average of 15 videos per person. The videos were recorded in multiple U.S. states with a diverse set of adults in various age, gender and apparent skin tone groups. A key feature is that each subject agreed to participate for their likenesses to be used. Additionally, our age and gender annotations are provided by the subjects themselves. A group of trained annotators labeled the subjects’ apparent skin tone using the Fitzpatrick skin type scale [6]. Moreover, annotations for videos recorded in low ambient lighting are also provided. As an application to measure robustness of predictions across certain attributes, we evaluate the state-of-the-art apparent age and gender classification methods. Our experiments provides a through analysis on these models in terms of fair treatment of people from various backgrounds.

[1]  Alper Ozpinar,et al.  LightFace: A Hybrid Deep Face Recognition Framework , 2020, 2020 Innovations in Intelligent Systems and Applications Conference (ASYU).

[2]  Yang Song,et al.  Age Progression/Regression by Conditional Adversarial Autoencoder , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Inioluwa Deborah Raji,et al.  Saving Face: Investigating the Ethical Concerns of Facial Recognition Auditing , 2020, AIES.

[4]  Tal Hassner,et al.  Age and gender classification using convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[5]  Xia Hu,et al.  Fairness in Deep Learning: A Computational Perspective , 2019, IEEE Intelligent Systems.

[6]  Anil K. Jain,et al.  Pushing the frontiers of unconstrained face detection and recognition: IARPA Janus Benchmark A , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Tal Hassner,et al.  Age and Gender Estimation of Unfiltered Faces , 2014, IEEE Transactions on Information Forensics and Security.

[8]  Yi-Ming Chan,et al.  Joint Estimation of Age and Gender from Unconstrained Face Images Using Lightweight Multi-Task CNN for Mobile Applications , 2018, 2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR).

[9]  Davis E. King,et al.  Dlib-ml: A Machine Learning Toolkit , 2009, J. Mach. Learn. Res..

[10]  Diego H. Milone,et al.  Gender imbalance in medical imaging datasets produces biased classifiers for computer-aided diagnosis , 2020, Proceedings of the National Academy of Sciences.

[11]  Fei-Fei Li,et al.  Towards fairer datasets: filtering and balancing the distribution of the people subtree in the ImageNet hierarchy , 2019, FAT*.

[12]  Kimmo Kärkkäinen,et al.  FairFace: Face Attribute Dataset for Balanced Race, Gender, and Age , 2019, ArXiv.

[13]  Timnit Gebru,et al.  Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification , 2018, FAT.

[14]  Ben Hutchinson,et al.  Non-portability of Algorithmic Fairness in India , 2020, ArXiv.

[15]  Susan C. Taylor,et al.  Racial limitations of fitzpatrick skin type. , 2020, Cutis.

[16]  Inioluwa Deborah Raji,et al.  Actionable Auditing: Investigating the Impact of Publicly Naming Biased Performance Results of Commercial AI Products , 2019, AIES.

[17]  M. Jeanmougin SOLEIL ET PEAU , 1992 .

[18]  Laurens van der Maaten,et al.  Does Object Recognition Work for Everyone? , 2019, CVPR Workshops.