Kernel Knockoffs Selection for Nonparametric Additive Models

Thanks to its fine balance between model flexibility and interpretability, the nonparametric additive model has been widely used, and variable selection for this type of model has received constant attention. However, none of the existing solutions can control the false discovery rate (FDR) under the finite sample setting. The knockoffs framework is a recent proposal that can effectively control the FDR with a finite sample size, but few knockoffs solutions are applicable to nonparametric models. In this article, we propose a novel kernel knockoffs selection procedure for the nonparametric additive model. We integrate three key components: the knockoffs, the subsampling for stability, and the random feature mapping for nonparametric function approximation. We show that the proposed method is guaranteed to control the FDR under any finite sample size, and achieves a power that approaches one as the sample size tends to infinity. We demonstrate the efficacy of our method through intensive numerical analyses and comparisons with the alternative solutions. Our proposal thus makes useful contributions to the methodology of nonparametric variable selection, FDR-based inference, as well as knockoffs.

[1]  Michael Breakspear,et al.  Graph analysis of the human connectome: Promise, progress, and pitfalls , 2013, NeuroImage.

[2]  P. Scheltens,et al.  Medial temporal lobe atrophy predicts Alzheimer's disease in patients with minor cognitive impairment , 2002, Journal of neurology, neurosurgery, and psychiatry.

[3]  Wenguang Sun,et al.  Large‐scale multiple testing under dependence , 2009 .

[4]  C. J. Stone,et al.  Additive Regression and Other Nonparametric Models , 1985 .

[5]  Ran Dai,et al.  The knockoff filter for FDR control in group-sparse and multitask regression , 2016, ICML.

[6]  N. Schuff,et al.  Different regional patterns of cortical thinning in Alzheimer's disease and frontotemporal dementia. , 2006, Brain : a journal of neurology.

[7]  Emmanuel J. Candès,et al.  False Discoveries Occur Early on the Lasso Path , 2015, ArXiv.

[8]  S. Geer On Hoeffding's Inequality for Dependent Random Variables , 2002 .

[9]  Francis R. Bach,et al.  Bolasso: model consistent Lasso estimation through the bootstrap , 2008, ICML '08.

[10]  Lorenzo Rosasco,et al.  Generalization Properties of Learning with Random Features , 2016, NIPS.

[11]  G. Wahba Spline models for observational data , 1990 .

[12]  Emmanuel J. Candes,et al.  Robust inference with knockoffs , 2018, The Annals of Statistics.

[13]  Cristian Sminchisescu,et al.  Fourier Kernel Learning , 2012, ECCV.

[14]  Yingying Fan,et al.  IPAD: Stable Interpretable Forecasting with Knockoffs Inference , 2018, Journal of the American Statistical Association.

[15]  G. Wahba,et al.  Smoothing spline ANOVA for exponential families, with application to the Wisconsin Epidemiological Study of Diabetic Retinopathy : the 1994 Neyman Memorial Lecture , 1995 .

[16]  C. Nachtsheim,et al.  Model‐free variable selection , 2005 .

[17]  A. Atkinson Subset Selection in Regression , 1992 .

[18]  Francis R. Bach,et al.  On the Equivalence between Kernel Quadrature Rules and Random Feature Expansions , 2015, J. Mach. Learn. Res..

[19]  YuBin,et al.  Minimax Rates of Estimation for High-Dimensional Linear Regression Over $\ell_q$ -Balls , 2011 .

[20]  Alan Y. Chiang,et al.  Generalized Additive Models: An Introduction With R , 2007, Technometrics.

[21]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[22]  Sylvia Richardson,et al.  Statistical Applications in Genetics and Molecular Biology False Discovery Rate Estimation for Stability Selection : Application to Genome-Wide Association Studies , 2012 .

[23]  J. Lafferty,et al.  High-dimensional Ising model selection using ℓ1-regularized logistic regression , 2010, 1010.0311.

[24]  Martin J. Wainwright,et al.  Minimax-Optimal Rates For Sparse Additive Models Over Kernel Classes Via Convex Programming , 2010, J. Mach. Learn. Res..

[25]  Akram Bakkour,et al.  The effects of aging and Alzheimer's disease on cerebral cortical anatomy: Specificity and differential relationships with cognition , 2013, NeuroImage.

[26]  T. Tony Cai,et al.  Large-Scale Global and Simultaneous Inference: Estimation and Testing in Very High Dimensions , 2017 .

[27]  Dominic Schuhmacher,et al.  Stochastic search for semiparametric linear regression models , 2013 .

[28]  E. Candès,et al.  Deep Knockoffs , 2018, Journal of the American Statistical Association.

[29]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[30]  N. Meinshausen,et al.  Stability selection , 2008, 0809.2932.

[31]  J. Horowitz,et al.  VARIABLE SELECTION IN NONPARAMETRIC ADDITIVE MODELS. , 2010, Annals of statistics.

[32]  Gaorong Li,et al.  RANK: Large-Scale Inference With Graphical Nonlinear Knockoffs , 2017, Journal of the American Statistical Association.

[33]  Yi Li,et al.  Component-wise gradient boosting and false discovery control in survival analysis with high-dimensional covariates , 2015, Bioinform..

[34]  V. Koltchinskii,et al.  SPARSITY IN MULTIPLE KERNEL LEARNING , 2010, 1211.2998.

[35]  J. Lafferty,et al.  Sparse additive models , 2007, 0711.4555.

[36]  S. Bochner,et al.  A theorem on Fourier-Stieltjes integrals , 1934 .

[37]  E. Candès,et al.  A knockoff filter for high-dimensional selective inference , 2016, The Annals of Statistics.

[38]  Ming Yuan,et al.  Minimax Optimal Rates of Estimation in High Dimensional Additive Models: Universal Phase Transition , 2015, ArXiv.

[39]  Lucas Janson,et al.  Panning for gold: ‘model‐X’ knockoffs for high dimensional controlled variable selection , 2016, 1610.02351.

[40]  2020 Alzheimer's disease facts and figures , 2020, Alzheimer's & dementia : the journal of the Alzheimer's Association.

[41]  Hao Helen Zhang,et al.  Component selection and smoothing in multivariate nonparametric regression , 2006, math/0702659.

[42]  Asaf Weinstein,et al.  A Power Analysis for Knockoffs with the Lasso Coefficient-Difference Statistic , 2020 .

[43]  Han Liu,et al.  A depression network of functionally connected regions discovered via multi-attribute canonical correlation graphs , 2016, NeuroImage.

[44]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[45]  E. Candès,et al.  Controlling the false discovery rate via knockoffs , 2014, 1404.5609.

[46]  R. Tibshirani,et al.  Generalized Additive Models , 1986 .

[47]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[48]  L. Stefanski,et al.  Approved by: Project Leader Approved by: LCG Project Leader Prepared by: Project Manager Prepared by: LCG Project Manager Reviewed by: Quality Assurance Manager , 2004 .

[49]  Jie Peng,et al.  BOOTSTRAP INFERENCE FOR NETWORK CONSTRUCTION WITH AN APPLICATION TO A BREAST CANCER MICROARRAY STUDY. , 2011, The annals of applied statistics.

[50]  P. Bosco,et al.  Brain atrophy in Alzheimer’s Disease and aging , 2016, Ageing Research Reviews.

[51]  Zhimei Ren,et al.  Derandomizing Knockoffs , 2020, 2012.02717.

[52]  J. Mercer Functions of Positive and Negative Type, and their Connection with the Theory of Integral Equations , 1909 .

[53]  Po-Ling Loh,et al.  High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity , 2011, NIPS.

[54]  Wenguang Sun,et al.  Oracle and Adaptive Compound Decision Rules for False Discovery Rate Control , 2007 .

[55]  B T Hyman,et al.  Entorhinal cortex pathology in Alzheimer's disease , 1991, Hippocampus.

[56]  John D. Storey,et al.  Empirical Bayes Analysis of a Microarray Experiment , 2001 .

[57]  A. Convit,et al.  Atrophy of the medial occipitotemporal, inferior, and middle temporal gyri in non-demented elderly predict decline to Alzheimer’s disease☆ , 2000, Neurobiology of Aging.

[58]  R. Sperling,et al.  The preclinical Alzheimer cognitive composite: measuring amyloid-related decline. , 2014, JAMA neurology.

[59]  Lexin Li,et al.  Kernel Ordinary Differential Equations , 2020, Journal of the American Statistical Association.

[60]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[61]  John D. Storey The optimal discovery procedure: a new approach to simultaneous significance testing , 2007 .

[62]  S. Geer,et al.  High-dimensional additive modeling , 2008, 0806.4115.