Visual Identification of Problematic Bias in Large Label Spaces

While the need for well-trained, fair ML systems is increasing ever more, measuring fairness for modern models and datasets is becoming increasingly difficult as they grow at an unprecedented pace. One key challenge in scaling common fairness metrics to such models and datasets is the requirement of exhaustive ground truth labeling, which cannot always be done. Indeed, this often rules out the application of traditional analysis metrics and systems. At the same time, ML-fairness assessments cannot be made algorithmically, as fairness is a highly subjective matter. Thus, domain experts need to be able to extract and reason about bias throughout models and datasets to make informed decisions. While visual analysis tools are of great help when investigating potential bias in DL models, none of the existing approaches have been designed for the specific tasks and challenges that arise in large label spaces. Addressing the lack of visualization work in this area, we propose guidelines for designing visualizations for such large label spaces, considering both technical and ethical issues. Our proposed visualization approach can be integrated into classical model and data pipelines, and we provide an implementation of our techniques open-sourced as a TensorBoard plug-in. With our approach, different models and datasets for large label spaces can be systematically and visually analyzed and compared to make informed fairness assessments tackling problematic bias.

[1]  Jürgen Bernard,et al.  VIAL: a unified process for visual interactive labeling , 2018, The Visual Computer.

[2]  Martin Wattenberg,et al.  The What-If Tool: Interactive Probing of Machine Learning Models , 2019, IEEE Transactions on Visualization and Computer Graphics.

[3]  Marco Hutter,et al.  Comparing Visual-Interactive Labeling with Active Learning: An Experimental Study , 2018, IEEE Transactions on Visualization and Computer Graphics.

[4]  Nathan Srebro,et al.  Equality of Opportunity in Supervised Learning , 2016, NIPS.

[5]  Alexandra Chouldechova,et al.  Fair prediction with disparate impact: A study of bias in recidivism prediction instruments , 2016, Big Data.

[6]  Jinwook Seo,et al.  Comparative Layouts Revisited: Design Space, Guidelines, and Future Directions , 2020, IEEE Transactions on Visualization and Computer Graphics.

[7]  Jock D. Mackinlay,et al.  Automating the design of graphical presentations of relational information , 1986, TOGS.

[8]  James H. Martin,et al.  Introduction to Natural Language Processing , 2019, Hands-on Question Answering Systems with BERT.

[9]  Jun Yuan,et al.  A survey of visual analytics techniques for machine learning , 2020, Computational Visual Media.

[10]  Leland McInnes,et al.  UMAP: Uniform Manifold Approximation and Projection , 2018, J. Open Source Softw..

[11]  Florian Heimerl,et al.  Boxer: Interactive Comparison of Classifier Results , 2020, Comput. Graph. Forum.

[12]  Silvia Miksch,et al.  Characterizing Guidance in Visual Analytics , 2017, IEEE Transactions on Visualization and Computer Graphics.

[13]  Yunfeng Zhang,et al.  Joint Optimization of AI Fairness and Utility: A Human-Centered Approach , 2020, AIES.

[14]  Tamara Munzner,et al.  Visualization Analysis and Design , 2014, A.K. Peters visualization series.

[15]  Toniann Pitassi,et al.  Fairness through awareness , 2011, ITCS '12.

[16]  Thomas Ertl,et al.  Visual Classifier Training for Text Document Retrieval , 2012, IEEE Transactions on Visualization and Computer Graphics.

[17]  Yang Chen,et al.  Interactive Correction of Mislabeled Training Data , 2019, 2019 IEEE Conference on Visual Analytics Science and Technology (VAST).

[18]  Moustapha Cissé,et al.  ConvNets and ImageNet Beyond Accuracy: Understanding Mistakes and Uncovering Biases , 2017, ECCV.

[19]  Andreas Kerren,et al.  Active Learning and Visual Analytics for Stance Classification with ALVA , 2017, ACM Trans. Interact. Intell. Syst..

[20]  Fan Du,et al.  The Impact of Presentation Style on Human-In-The-Loop Detection of Algorithmic Bias , 2020, Graphics Interface.

[21]  Karen Yeung,et al.  Recommendation of the Council on Artificial Intelligence (OECD) , 2020, International Legal Materials.

[22]  Timo Ropinski,et al.  Classifier‐Guided Visual Correction of Noisy Labels for Image Classification Tasks , 2020, Comput. Graph. Forum.

[23]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[24]  Andrew D. Selbst,et al.  Big Data's Disparate Impact , 2016 .

[25]  Trevor Darrell,et al.  Women also Snowboard: Overcoming Bias in Captioning Models , 2018, ECCV.

[26]  Changjian Chen,et al.  An Interactive Method to Improve Crowdsourced Annotations , 2019, IEEE Transactions on Visualization and Computer Graphics.

[27]  Gunther Heidemann,et al.  Inter-active learning of ad-hoc classifiers for video visual analytics , 2012, 2012 IEEE Conference on Visual Analytics Science and Technology (VAST).

[28]  Jordi Pont-Tuset,et al.  The Open Images Dataset V4 , 2018, International Journal of Computer Vision.

[29]  Jing Li,et al.  Judging Correlation from Scatterplots and Parallel Coordinate Plots , 2010, Inf. Vis..

[30]  R. Daniel Bergeron,et al.  Evaluating Two Visualization Techniques for Genome Comparison , 2007, 2007 11th International Conference Information Visualization (IV '07).

[31]  L. R. Dice Measures of the Amount of Ecologic Association Between Species , 1945 .

[32]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[33]  Fan Du,et al.  Designing Tools for Semi-Automated Detection of Machine Learning Biases: An Interview Study , 2020, ArXiv.

[34]  Huamin Qu,et al.  DECE: Decision Explorer with Counterfactual Explanations for Machine Learning Models , 2020, IEEE Transactions on Visualization and Computer Graphics.

[35]  Margaret Mitchell,et al.  Measuring Model Biases in the Absence of Ground Truth , 2021, AIES.

[36]  G S Michaels,et al.  Cluster analysis and data visualization of large-scale gene expression data. , 1998, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[37]  M. Kendall A NEW MEASURE OF RANK CORRELATION , 1938 .

[38]  Brian W. Powers,et al.  Dissecting racial bias in an algorithm used to manage the health of populations , 2019, Science.

[39]  Judy Hoffman,et al.  Predictive Inequity in Object Detection , 2019, ArXiv.

[40]  Timnit Gebru,et al.  Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification , 2018, FAT.

[41]  Anna Jobin,et al.  The global landscape of AI ethics guidelines , 2019, Nature Machine Intelligence.

[42]  Allison Woodruff,et al.  Guidelines for using multiple views in information visualization , 2000, AVI '00.

[43]  Minsuk Kahng,et al.  FAIRVIS: Visual Analytics for Discovering Intersectional Bias in Machine Learning , 2019, 2019 IEEE Conference on Visual Analytics Science and Technology (VAST).

[44]  R. E. Christ Review and Analysis of Color Coding Research for Visual Displays , 1975 .

[45]  Niels Bantilan,et al.  Themis-ml: A Fairness-Aware Machine Learning Interface for End-To-End Discrimination Discovery and Mitigation , 2017, ArXiv.

[46]  Luc Van Gool,et al.  The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.

[47]  Yu-Ru Lin,et al.  FairSight: Visual Analytics for Fairness in Decision Making , 2019, IEEE Transactions on Visualization and Computer Graphics.

[48]  Gerlof Bouma,et al.  Normalized (pointwise) mutual information in collocation extraction , 2009 .

[49]  Silvia Miksch,et al.  Guide Me in Analysis: A Framework for Guidance Designers , 2020, Comput. Graph. Forum.

[50]  W. Cleveland,et al.  Graphical Perception: Theory, Experimentation, and Application to the Development of Graphical Methods , 1984 .

[51]  Ben Shneiderman,et al.  The eyes have it: a task by data type taxonomy for information visualizations , 1996, Proceedings 1996 IEEE Symposium on Visual Languages.

[52]  P. Jaccard THE DISTRIBUTION OF THE FLORA IN THE ALPINE ZONE.1 , 1912 .

[53]  Denis Gracanin,et al.  A Comparison of Radial and Linear Charts for Visualizing Daily Patterns , 2019, IEEE Transactions on Visualization and Computer Graphics.

[54]  Ece Kamar,et al.  Revolt: Collaborative Crowdsourcing for Labeling Machine Learning Datasets , 2017, CHI.

[55]  F. Rossi,et al.  The State of the Art in Enhancing Trust in Machine Learning Models with the Use of Visualizations , 2020, Comput. Graph. Forum.

[56]  Yong Wang,et al.  Visual Analysis of Discrimination in Machine Learning , 2020, IEEE transactions on visualization and computer graphics.

[57]  Adam Tauman Kalai,et al.  Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings , 2016, NIPS.