Automated Data Slicing for Model Validation: A Big Data - AI Integration Approach

As machine learning systems become democratized, it becomes increasingly important to help users easily debug their models. However, current data tools are still primitive when it comes to helping users trace model performance problems all the way to the data. We focus on the particular problem of slicing data to identify subsets of the validation data where the model performs poorly. This is an important problem in model validation because the overall model performance can fail to reflect that of the smaller subsets, and slicing allows users to analyze the model performance on a more granular-level. Unlike general techniques (e.g., clustering) that can find arbitrary slices, our goal is to find interpretable slices (which are easier to take action compared to arbitrary subsets) that are problematic and large. We propose <inline-formula><tex-math notation="LaTeX">$\mathsf{Slice Finder}$</tex-math><alternatives><mml:math><mml:mi mathvariant="sans-serif">SliceFinder</mml:mi></mml:math><inline-graphic xlink:href="whang-ieq1-2916074.gif"/></alternatives></inline-formula>, which is an interactive framework for identifying such slices using statistical techniques. Applications include diagnosing model fairness and fraud detection, where identifying slices that are interpretable to humans is crucial. This research is part of a larger trend of Big data and Artificial Intelligence (AI) integration and opens many opportunities for new research.

[1]  Sanjay Krishnan,et al.  PALM: Machine Learning Explanations For Iterative Debugging , 2017, HILDA@SIGMOD.

[2]  Luca Oneto,et al.  Fairness in Machine Learning , 2020, INNSBDDL.

[3]  Parag Agrawal,et al.  Interpretable and Informative Explanations of Outcomes , 2014, Proc. VLDB Endow..

[4]  Alex Alves Freitas,et al.  Comprehensible classification models: a position paper , 2014, SKDD.

[5]  Sunita Sarawagi,et al.  Intelligent Rollups in Multidimensional OLAP Data , 2001, VLDB.

[6]  Welch Bl THE GENERALIZATION OF ‘STUDENT'S’ PROBLEM WHEN SEVERAL DIFFERENT POPULATION VARLANCES ARE INVOLVED , 1947 .

[7]  Carsten Binnig,et al.  Controlling False Discoveries During Interactive Data Exploration , 2016, SIGMOD Conference.

[8]  Neoklis Polyzotis,et al.  Data Management Challenges in Production Machine Learning , 2017, SIGMOD Conference.

[9]  Reid A. Johnson,et al.  Calibrating Probability with Undersampling for Unbalanced Classification , 2015, 2015 IEEE Symposium Series on Computational Intelligence.

[10]  Toniann Pitassi,et al.  Fairness through awareness , 2011, ITCS '12.

[11]  Dean P. Foster,et al.  α‐investing: a procedure for sequential control of expected false discoveries , 2008 .

[12]  Jacob Cohen Statistical Power Analysis for the Behavioral Sciences , 1969, The SAGE Encyclopedia of Research Design.

[13]  Wes McKinney,et al.  pandas: a Foundational Python Library for Data Analysis and Statistics , 2011 .

[14]  Neoklis Polyzotis,et al.  Data Lifecycle Challenges in Production Machine Learning , 2018, SIGMOD Rec..

[15]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[16]  Gail M. Sullivan,et al.  Using Effect Size-or Why the P Value Is Not Enough. , 2012, Journal of graduate medical education.

[17]  Tim Kraska,et al.  Slice Finder: Automated Data Slicing for Model Interpretability , 2017 .

[18]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[19]  Nathan Srebro,et al.  Equality of Opportunity in Supervised Learning , 2016, NIPS.

[20]  Carlos Eduardo Scheidegger,et al.  Certifying and Removing Disparate Impact , 2014, KDD.

[21]  Vipin Kumar,et al.  Parallel Formulations of Decision-Tree Classification Algorithms , 2004, Data Mining and Knowledge Discovery.

[22]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[23]  Jure Leskovec,et al.  Interpretable & Explorable Approximations of Black Box Models , 2017, ArXiv.

[24]  Rachel K. E. Bellamy,et al.  AI Fairness 360: An Extensible Toolkit for Detecting, Understanding, and Mitigating Unwanted Algorithmic Bias , 2018, ArXiv.

[25]  Tim Kraska,et al.  Slice Finder: Automated Data Slicing for Model Validation , 2018, 2019 IEEE 35th International Conference on Data Engineering (ICDE).

[26]  Carlos Guestrin,et al.  Anchors: High-Precision Model-Agnostic Explanations , 2018, AAAI.

[27]  Anna Rumshisky,et al.  Unfolding physiological state: mortality modelling in intensive care units , 2014, KDD.

[28]  Krishna P. Gummadi,et al.  Fairness Beyond Disparate Treatment & Disparate Impact: Learning Classification without Disparate Mistreatment , 2016, WWW.

[29]  Percy Liang,et al.  Understanding Black-box Predictions via Influence Functions , 2017, ICML.

[30]  Aditya G. Parameswaran,et al.  Interactive data exploration with smart drill-down , 2014, 2016 IEEE 32nd International Conference on Data Engineering (ICDE).

[31]  Krishna P. Gummadi,et al.  Fairness Constraints: Mechanisms for Fair Classification , 2015, AISTATS.

[32]  Jon M. Kleinberg,et al.  Inherent Trade-Offs in the Fair Determination of Risk Scores , 2016, ITCS.

[33]  Enrico Bertini,et al.  Interpreting Black-Box Classifiers Using Instance-Level Visual Explanations , 2017, HILDA@SIGMOD.

[34]  Venkatesan Guruswami,et al.  Combinatorial feature selection problems , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[35]  Minsuk Kahng,et al.  Visual exploration of machine learning results using data cube analysis , 2016, HILDA '16.

[36]  Ron Kohavi,et al.  Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid , 1996, KDD.

[37]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[38]  Osbert Bastani,et al.  Interpreting Blackbox Models via Model Extraction , 2017, ArXiv.

[39]  Martin Wattenberg,et al.  Ad click prediction: a view from the trenches , 2013, KDD.

[40]  Xin Zhang,et al.  TFX: A TensorFlow-Based Production-Scale Machine Learning Platform , 2017, KDD.

[41]  Been Kim,et al.  Towards A Rigorous Science of Interpretable Machine Learning , 2017, 1702.08608.