Aggregation of Multiple Knockoffs

We develop an extension of the Knockoff Inference procedure, introduced by Barber and Candes (2015). This new method, called Aggregation of Multiple Knockoffs (AKO), addresses the instability inherent to the random nature of Knockoff-based inference. Specifically, AKO improves both the stability and power compared with the original Knockoff algorithm while still maintaining guarantees for False Discovery Rate control. We provide a new inference procedure, prove its core properties, and demonstrate its benefits in a set of experiments on synthetic and real datasets.

[1]  Gilles Blanchard,et al.  Adaptive False Discovery Rate Control under Independence and Dependence , 2009, J. Mach. Learn. Res..

[2]  Karsten M. Borgwardt,et al.  Faculty Opinions recommendation of Panning for gold: ‘model‐X’ knockoffs for high dimensional controlled variable selection. , 2019, Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature.

[3]  Martin J. Wainwright,et al.  Optimal Rates and Tradeoffs in Multiple Testing , 2017, Statistica Sinica.

[4]  C. Giraud Introduction to High-Dimensional Statistics , 2014 .

[5]  Junyang Qian,et al.  Communication-Efficient False Discovery Rate Control via Knockoff Aggregation , 2015, 1506.05446.

[6]  Yoshinobu Kawahara,et al.  Efficient network-guided multi-locus association mapping with graph cuts , 2012, Bioinform..

[7]  Y. Benjamini,et al.  THE CONTROL OF THE FALSE DISCOVERY RATE IN MULTIPLE TESTING UNDER DEPENDENCY , 2001 .

[9]  E. Candès,et al.  Controlling the false discovery rate via knockoffs , 2014, 1404.5609.

[10]  Bjarni J. Vilhjálmsson,et al.  Genome-wide association study of 107 phenotypes in Arabidopsis thaliana inbred lines , 2010 .

[11]  Ingo Steinwart,et al.  A Bernstein-type Inequality for Some Mixing Processes and Dynamical Systems with an Application to Learning , 2015, 1501.03059.

[12]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[13]  Jean-Philippe Vert,et al.  kernelPSI: a Post-Selection Inference Framework for Nonlinear Variable Selection , 2019, ICML.

[14]  E. Rio,et al.  Bernstein inequality and moderate deviations under strong mixing conditions , 2012, 1202.4777.

[15]  Joseph P. Romano,et al.  MULTIPLE DATA SPLITTING FOR TESTING By , 2019 .

[16]  Dong Chen,et al.  A Putative CCAAT-Binding Transcription Factor Is a Regulator of Flowering Timing in Arabidopsis1[C][W][OA] , 2007, Plant Physiology.

[17]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[18]  H. Nam,et al.  FIONA1 Is Essential for Regulating Period Length in the Arabidopsis Circadian Clock[W] , 2008, The Plant Cell Online.

[19]  Gábor Lugosi,et al.  Concentration Inequalities - A Nonasymptotic Theory of Independence , 2013, Concentration Inequalities.

[20]  A. Abdesselam The weakly dependent strong law of large numbers revisited , 2018, 1801.09265.

[21]  Adel Javanmard,et al.  False Discovery Rate Control via Debiased Lasso , 2018, Electronic Journal of Statistics.

[22]  Bradley Efron,et al.  False Discovery Rate Control , 2010 .

[23]  Paul-Marie Samson,et al.  Concentration of measure inequalities for Markov chains and $\Phi$-mixing processes , 2000 .

[24]  E. Arias-Castro,et al.  Distribution-free Multiple Testing , 2016, 1604.07520.

[25]  G. Lugosi,et al.  On Concentration-of-Measure Inequalities , 1998 .

[26]  James Y. Zou,et al.  Improving the Stability of the Knockoff Procedure: Multiple Simultaneous Knockoffs and Entropy Maximization , 2018, AISTATS.

[27]  Peter Bühlmann,et al.  p-Values for High-Dimensional Regression , 2008, 0811.2177.

[28]  Uri Keich,et al.  Controlling the FDR in variable selection via multiple knockoffs , 2019, 1911.09442.

[29]  T. Sun,et al.  The Arabidopsis RGA Gene Encodes a Transcriptional Regulator Repressing the Gibberellin Signal Transduction Pathway , 1998, Plant Cell.