Fast Lasso method for Large-scale and Ultrahigh-dimensional Cox Model with applications to UK Biobank

We develop a scalable and highly efficient algorithm to fit a Cox proportional hazard model by maximizing the L1-regularized (Lasso) partial likelihood function, based on the Batch Screening Iterative Lasso (BASIL) method developed in (Qian et al. 2019). The output of our algorithm is the full Lasso path, the parameter estimates at all predefined regularization parameters, as well as their validation accuracy measured using the concordance index (C-index) or the validation deviance. To demonstrate the effectiveness of our algorithm, we analyze a large genotype-survival time dataset across 306 disease outcomes from the UK Biobank (Sudlow et al. 2015). Our approach, which we refer to as snpnet-Cox, is implemented in a publicly available package.

[1]  M. Rivas,et al.  Phenome-wide Burden of Copy Number Variation in the UK Biobank. , 2019, American journal of human genetics.

[2]  D. Postma,et al.  Sequence variants affecting eosinophil numbers associate with asthma and myocardial infarction , 2009, Nature Genetics.

[3]  Trevor Hastie,et al.  A fast and scalable framework for large-scale and ultrahigh-dimensional sparse regression with application to the UK Biobank , 2019, bioRxiv.

[4]  Matthew Aguirre,et al.  Phenome-wide burden of copy number variation in UK Biobank , 2019, bioRxiv.

[5]  D. Gudbjartsson,et al.  A rare IL33 loss-of-function mutation reduces blood eosinophil counts and protects from asthma , 2017, PLoS genetics.

[6]  P. Bosma Inherited disorders of bilirubin metabolism. , 2003, Journal of hepatology.

[7]  Christopher M. DeBoever,et al.  Global Biobank Engine: enabling genotype-phenotype browsing for biobank summary statistics , 2018, bioRxiv.

[8]  F. Harrell,et al.  Evaluating the yield of medical tests. , 1982, JAMA.

[9]  P. Donnelly,et al.  The UK Biobank resource with deep phenotyping and genomic data , 2018, Nature.

[10]  R. Tibshirani,et al.  Strong rules for discarding predictors in lasso‐type problems , 2010, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[11]  J. Kelsen,et al.  The role of monogenic disease in children with very early onset inflammatory bowel disease , 2017, Current opinion in pediatrics.

[12]  Daniel Lemire,et al.  Faster Population Counts Using AVX2 Instructions , 2016, Comput. J..

[13]  R. Terkeltaub Clinical practice. Gout. , 2003, The New England journal of medicine.

[14]  Jonathan Taylor,et al.  Statistical learning and selective inference , 2015, Proceedings of the National Academy of Sciences.

[15]  R. Tukey,et al.  Human UDP-glucuronosyltransferases: metabolism, expression, and disease. , 2000, Annual review of pharmacology and toxicology.

[16]  David Thomas,et al.  The Art in Computer Programming , 2001 .

[17]  D.,et al.  Regression Models and Life-Tables , 2022 .

[18]  R. McNamara,et al.  Management of Atrial Fibrillation: Review of the Evidence for the Role of Pharmacologic Therapy, Electrical Cardioversion, and Echocardiography , 2003, Annals of Internal Medicine.

[19]  Robert Tibshirani,et al.  On the Use of C-index for Stratified and Cross-Validated Cox Model , 2019, 1911.09638.

[20]  A. Hofman,et al.  Association of three genetic loci with uric acid concentration and risk of gout: a genome-wide association study , 2008, The Lancet.

[21]  P. Elliott,et al.  UK Biobank: An Open Access Resource for Identifying the Causes of a Wide Range of Complex Diseases of Middle and Old Age , 2015, PLoS medicine.