A machine learning method for selection of genetic variants to increase prediction accuracy of type 2 diabetes mellitus using sequencing data

[1]  R. Tibshirani,et al.  Sparse Principal Component Analysis , 2006 .

[2]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[3]  Eric Boerwinkle,et al.  The gene, environment association studies consortium (GENEVA): maximizing the knowledge obtained from GWAS by collaboration across studies of multiple conditions , 2010, Genetic epidemiology.

[4]  Tanya M. Teslovich,et al.  An Expanded Genome-Wide Association Study of Type 2 Diabetes in Europeans , 2017, Diabetes.

[5]  Haiyan Wang,et al.  New two-sample tests for skewed populations and their connection to theoretical power of Bootstrap-t test , 2017 .

[6]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[7]  Stefan Johansson,et al.  Assessing the phenotypic effects in the general population of rare variants in genes for a dominant Mendelian form of diabetes , 2013, Nature Genetics.

[8]  Frank B. Hu,et al.  Global aetiology and epidemiology of type 2 diabetes mellitus and its complications , 2018, Nature Reviews Endocrinology.

[9]  R. Tibshirani,et al.  Diagnosis of multiple cancer types by shrunken centroids of gene expression , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[10]  J. Manson,et al.  Diet, lifestyle, and the risk of type 2 diabetes mellitus in women. , 2001, The New England journal of medicine.

[11]  Qingyao Wu,et al.  Genome-wide association data classification and SNPs selection using two-stage quality-based Random Forests , 2015, BMC Genomics.

[12]  Pierre Mahé,et al.  Predicting bacterial resistance from whole-genome sequences using k-mers and stability selection , 2018, BMC Bioinformatics.

[13]  K. Lunetta,et al.  Screening large-scale association study data: exploiting interactions using random forests , 2004, BMC Genetics.

[14]  Cun-Hui Zhang,et al.  Paths Following Algorithm for Penalized Logistic Regression Using SCAD and MCP , 2014, Commun. Stat. Simul. Comput..

[15]  A. Cecile J.W. Janssens,et al.  Predicting Type 2 Diabetes Based on Polymorphisms From Genome-Wide Association Studies , 2008, Diabetes.

[16]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[17]  Weihua Guan,et al.  Meta-Analysis of 23 Type 2 Diabetes Linkage Studies from the International Type 2 Diabetes Linkage Analysis Consortium , 2007, Human Heredity.

[18]  Laura J. Scott,et al.  Stratifying Type 2 Diabetes Cases by BMI Identifies Genetic Risk Variants in LAMA1 and Enrichment for Risk Variants in Lean Compared to Obese Cases , 2012, PLoS genetics.

[19]  Cen Wu,et al.  A novel method for identifying nonlinear gene–environment interactions in case–control association studies , 2013, Human Genetics.

[20]  J. Flannick,et al.  Type 2 diabetes: genetic data sharing to advance complex disease research , 2016, Nature Reviews Genetics.

[21]  Keurcien Luu,et al.  Pcadapt: An R Package to Perform Genome Scans for Selection Based on Principal Component Analysis , 2016 .

[22]  Mark I McCarthy,et al.  Genome-wide association studies in type 2 diabetes , 2009, Current diabetes reports.

[23]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[24]  E. Topol,et al.  The personal and clinical utility of polygenic risk scores , 2018, Nature Reviews Genetics.

[25]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[26]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[27]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[28]  Jianqing Fan,et al.  Nonconcave penalized likelihood with a diverging number of parameters , 2004, math/0406466.

[29]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..