Variable Importance Clouds: A Way to Explore Variable Importance for the Set of Good Models

Variable importance is central to scientific studies, including the social sciences and causal inference, healthcare, and in other domains. However, current notions of variable importance are often tied to a specific predictive model. This is problematic: what if there were multiple well-performing predictive models, and a specific variable is important to some of them and not to others? In that case, we may not be able to tell from a single well-performing model whether a variable is always important in predicting the outcome. Rather than depending on variable importance for a single predictive model, we would like to explore variable importance for all approximately-equally-accurate predictive models. This work introduces the concept of a variable importance cloud, which maps every variable to its importance for every good predictive model. We show properties of the variable importance cloud and draw connections other areas of statistics. We introduce variable importance diagrams as a projection of the variable importance cloud into two dimensions for visualization purposes. Experiments with criminal justice and marketing data illustrate how variables can change dramatically in importance for approximately-equally-accurate predictive models.

[1]  Roy E. Welsch,et al.  Efficient Computing of Regression Diagnostics , 1981 .

[2]  Cynthia Rudin,et al.  Machine learning with operational costs , 2011, J. Mach. Learn. Res..

[3]  M. Gevrey,et al.  Review and comparison of methods to study the contribution of variables in artificial neural network models , 2003 .

[4]  Cynthia Rudin,et al.  A Theory of Statistical Inference for Ensuring the Robustness of Scientific Results , 2018, Manag. Sci..

[5]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[6]  Cynthia Rudin,et al.  A Bayesian Framework for Learning Rule Sets for Interpretable Classification , 2017, J. Mach. Learn. Res..

[7]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[8]  Christopher T. Lowenkamp,et al.  False Positives, False Negatives, and False Analyses: A Rejoinder to "Machine Bias: There's Software Used across the Country to Predict Future Criminals. and It's Biased against Blacks" , 2016 .

[9]  Bernd Bischl,et al.  Visualizing the Feature Importance for Black Box Models , 2018, ECML/PKDD.

[10]  Yaacov Ritov,et al.  Identifying a Minimal Class of Models for High-dimensional Data , 2015, J. Mach. Learn. Res..

[11]  Pietro Perona,et al.  Graph-Based Visual Saliency , 2006, NIPS.

[12]  Cynthia Rudin,et al.  Model Class Reliance: Variable Importance Measures for any Machine Learning Model Class, from the "Rashomon" Perspective , 2018 .

[13]  Leo Breiman,et al.  Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author) , 2001 .