The Holdout Randomization Test for Feature Selection in Black Box Models

We propose the holdout randomization test (HRT), an approach to feature selection using black box predictive models. The HRT is a specialized version of the conditional randomization test (CRT) tha...

[1]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[2]  F. Liang,et al.  Bayesian Neural Networks for Selection of Drug Sensitive Genes , 2018, Journal of the American Statistical Association.

[3]  Sayan Mukherjee,et al.  Bayesian Approximate Kernel Regression With Variable Selection , 2015, Journal of the American Statistical Association.

[4]  Lucas Janson,et al.  Panning for gold: ‘model‐X’ knockoffs for high dimensional controlled variable selection , 2016, 1610.02351.

[5]  Y. Benjamini,et al.  THE CONTROL OF THE FALSE DISCOVERY RATE IN MULTIPLE TESTING UNDER DEPENDENCY , 2001 .

[6]  Alessandro Rinaldo,et al.  Distribution-Free Predictive Inference for Regression , 2016, Journal of the American Statistical Association.

[7]  Avanti Shrikumar,et al.  Learning Important Features Through Propagating Activation Differences , 2017, ICML.

[8]  Rajen Dinesh Shah,et al.  The hardness of conditional independence testing and the generalised covariance measure , 2018, The Annals of Statistics.

[9]  William Stafford Noble,et al.  DeepPINK: reproducible feature selection in deep neural networks , 2018, NeurIPS.

[10]  Emmanuel J. Candes,et al.  Robust inference with knockoffs , 2018, The Annals of Statistics.

[11]  Mihaela van der Schaar,et al.  KnockoffGAN: Generating Knockoffs for Feature Selection using Generative Adversarial Networks , 2018, ICLR.

[12]  C. Bishop Mixture density networks , 1994 .

[13]  Axel Gandy,et al.  QuickMMCTest: quick multiple Monte Carlo testing , 2014, Stat. Comput..

[14]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[15]  James B. Brown,et al.  Iterative random forests to discover predictive and stable high-order interactions , 2017, Proceedings of the National Academy of Sciences.

[16]  Cynthia Rudin,et al.  Model Class Reliance: Variable Importance Measures for any Machine Learning Model Class, from the "Rashomon" Perspective , 2018 .

[17]  Leo Breiman,et al.  Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author) , 2001 .

[18]  David Blei,et al.  Double Empirical Bayes Testing , 2020, International statistical review = Revue internationale de statistique.

[19]  Nando de Freitas,et al.  Taking the Human Out of the Loop: A Review of Bayesian Optimization , 2016, Proceedings of the IEEE.

[20]  E. Candès,et al.  Deep Knockoffs , 2018, Journal of the American Statistical Association.

[21]  Scott W. Linderman,et al.  Dose-response modeling in high-throughput cancer drug screenings: an end-to-end approach. , 2018, Biostatistics.

[22]  M Sesia,et al.  Gene hunting with hidden Markov model knockoffs , 2017, Biometrika.

[23]  Peter Bühlmann,et al.  p-Values for High-Dimensional Regression , 2008, 0811.2177.

[24]  R. Tibshirani,et al.  The problem of regions , 1998 .

[25]  Vladimir Vovk,et al.  Conditional validity of inductive conformal predictors , 2012, Machine Learning.

[26]  A. Califano,et al.  Dialogue on Reverse‐Engineering Assessment and Methods , 2007, Annals of the New York Academy of Sciences.

[27]  David M. Blei,et al.  Black Box FDR , 2018, ICML.

[28]  Sreeram Kannan,et al.  Mimic and Classify : A meta-algorithm for Conditional Independence Testing , 2018, ArXiv.

[29]  Leo Breiman,et al.  Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author) , 2001, Statistical Science.

[30]  Amit Dhurandhar,et al.  Predicting human olfactory perception from chemical features of odor molecules , 2017, Science.

[31]  L. Wasserman,et al.  HIGH DIMENSIONAL VARIABLE SELECTION. , 2007, Annals of statistics.

[32]  E. Candès,et al.  Controlling the false discovery rate via knockoffs , 2014, 1404.5609.

[33]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[34]  James Y. Zou,et al.  Knockoffs for the mass: new feature importance statistics with false discovery guarantees , 2018, AISTATS.

[35]  Adam A. Margolin,et al.  The Cancer Cell Line Encyclopedia enables predictive modeling of anticancer drug sensitivity , 2012, Nature.

[36]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[37]  Le Song,et al.  Learning to Explain: An Information-Theoretic Perspective on Model Interpretation , 2018, ICML.