The function-on-scalar LASSO with applications to longitudinal GWAS

We present a new methodology for simultaneous variable selection and parameter estimation in function-on-scalar regression with an ultra-high dimensional predictor vector. We extend the LASSO to functional data in both the $\textit{dense}$ functional setting and the $\textit{sparse}$ functional setting. We provide theoretical guarantees which allow for an exponential number of predictor variables. Simulations are carried out which illustrate the methodology and compare the sparse/functional methods. Using the Framingham Heart Study, we demonstrate how our tools can be used in genome-wide association studies, finding a number of genetic mutations which affect blood pressure and are therefore important for cardiovascular health.

[1]  H. Müller,et al.  Functional Modeling of Longitudinal Data , 2006 .

[2]  Daniel Levy,et al.  The Framingham Heart Study and the epidemiology of cardiovascular disease: a historical perspective , 2014, The Lancet.

[3]  Sara van de Geer,et al.  Statistics for High-Dimensional Data: Methods, Theory and Applications , 2011 .

[4]  Matthew Reimherr,et al.  A FUNCTIONAL DATA ANALYSIS APPROACH FOR GENETIC ASSOCIATION STUDIES , 2014, 1404.7301.

[5]  Gareth M. James,et al.  Functional additive regression , 2015, 1510.04064.

[6]  J Gertheiss,et al.  Variable selection in generalized functional linear models , 2013, Stat.

[7]  Heng Lian SHRINKAGE ESTIMATION AND SELECTION FOR MULTIPLE FUNCTIONAL REGRESSION , 2011 .

[8]  P. Bickel,et al.  SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR , 2008, 0801.1095.

[9]  Roberto Imbuzeiro Oliveira,et al.  The lower tail of random quadratic forms with applications to ordinary least squares , 2013, ArXiv.

[10]  C. Gieger,et al.  Genomewide association analysis of coronary artery disease. , 2007, The New England journal of medicine.

[11]  Wolfgang Jank,et al.  FUNCTIONAL RESPONSE ADDITIVE MODEL ESTIMATION WITH ONLINE VIRTUAL STOCK MARKETS , 2014, 1502.00818.

[12]  Sudha Seshadri,et al.  Framingham Heart Study 100K project: genome-wide associations for cardiovascular disease outcomes , 2007, BMC Medical Genetics.

[13]  Philip T. Reiss,et al.  The International Journal of Biostatistics Fast Function-on-Scalar Regression with Penalized Basis Expansions , 2011 .

[14]  Hans-Georg Müller,et al.  Functional Data Analysis , 2016 .

[15]  Fang Yao,et al.  Partially functional linear regression in high dimensions , 2016 .

[16]  Jane-Ling Wang,et al.  From sparse to dense functional data and beyond , 2016 .

[17]  Daniela M. Witten,et al.  An Introduction to Statistical Learning: with Applications in R , 2013 .

[18]  R. D'Agostino,et al.  A genome-wide association study for blood lipid phenotypes in the Framingham Heart Study , 2007, BMC Medical Genetics.

[19]  R. DeVore,et al.  A Simple Proof of the Restricted Isometry Property for Random Matrices , 2008 .

[20]  T. Hsing,et al.  Uniform convergence rates for nonparametric regression and principal component analysis in functional/longitudinal data , 2010, 1211.2137.

[21]  A. Qu,et al.  Weak signal identification and inference in penalized model selection , 2016, 1611.04638.

[22]  Runze Li,et al.  MULTIVARIATE VARYING COEFFICIENT MODEL FOR FUNCTIONAL RESPONSES. , 2012, Annals of statistics.

[23]  Aina Estarellas-Roca,et al.  REVISTA ESPAÑOLA DE , 2016 .

[24]  Jane-ling Wang,et al.  Functional linear regression analysis for longitudinal data , 2005, math/0603132.

[25]  Gareth M. James,et al.  Functional linear regression that's interpretable , 2009, 0908.2918.

[26]  P. Massart,et al.  Adaptive estimation of a quadratic functional by model selection , 2000 .

[27]  Scott F. Saccone,et al.  Novel genes identified in a high-density genome wide association study for nicotine dependence. , 2007, Human molecular genetics.

[28]  G. Abecasis,et al.  A Genome-Wide Association Study of Type 2 Diabetes in Finns Detects Multiple Susceptibility Variants , 2007, Science.

[29]  C. O’Donnell,et al.  [Cardiovascular risk factors. Insights from Framingham Heart Study]. , 2008, Revista espanola de cardiologia.

[30]  Piotr Kokoszka,et al.  Testing for lack of dependence in the functional linear model , 2008 .

[31]  Li Ping Yang,et al.  Nonparametric smoothing estimates of time-varying coefficient models with longitudinal data , 1998 .

[32]  Piotr Kokoszka,et al.  Inference for Functional Data with Applications , 2012 .

[33]  Jianhua Z. Huang,et al.  Variable Selection in Nonparametric Varying-Coefficient Models for Analysis of Repeated Measurements , 2008, Journal of the American Statistical Association.

[34]  Jie Wang,et al.  Lasso screening rules via dual polytope projection , 2012, J. Mach. Learn. Res..

[35]  Liugen Xue,et al.  Variable selection for semiparametric varying coefficient partially linear errors-in-variables models , 2010, J. Multivar. Anal..

[36]  S. Geer,et al.  Oracle Inequalities and Optimal Inference under Group Sparsity , 2010, 1007.1771.

[37]  Jianqing Fan,et al.  Two‐step estimation of functional linear models with applications to longitudinal data , 1999 .

[38]  D. Levy,et al.  Contributions of the Framingham Heart Study to the Epidemiology of Coronary Heart Disease. , 2016, JAMA cardiology.

[39]  R. Tibshirani The Lasso Problem and Uniqueness , 2012, 1206.0313.

[40]  Ing,et al.  Functional Linear Regression That ’ s Interpretable , 2008 .

[41]  R. Ogden,et al.  Variable selection in function‐on‐scalar regression , 2016, Stat.

[42]  P. Sarda,et al.  SPLINE ESTIMATORS FOR THE FUNCTIONAL LINEAR MODEL , 2003 .

[43]  Sham M. Kakade,et al.  A tail inequality for quadratic forms of subgaussian random vectors , 2011, ArXiv.

[44]  Sadanori Konishi,et al.  Variable selection for functional regression models via the L1 regularization , 2011, Comput. Stat. Data Anal..