Elastic-net regularization approaches for genome-wide association studies of rheumatoid arthritis

The current trend in genome-wide association studies is to identify regions where the true disease-causing genes may lie by evaluating thousands of single-nucleotide polymorphisms (SNPs) across the whole genome. However, many challenges exist in detecting disease-causing genes among the thousands of SNPs. Examples include multicollinearity and multiple testing issues, especially when a large number of correlated SNPs are simultaneously tested. Multicollinearity can often occur when predictor variables in a multiple regression model are highly correlated, and can cause imprecise estimation of association. In this study, we propose a simple stepwise procedure that identifies disease-causing SNPs simultaneously by employing elastic-net regularization, a variable selection method that allows one to address multicollinearity. At Step 1, the single-marker association analysis was conducted to screen SNPs. At Step 2, the multiple-marker association was scanned based on the elastic-net regularization. The proposed approach was applied to the rheumatoid arthritis (RA) case-control data set of Genetic Analysis Workshop 16. While the selected SNPs at the screening step are located mostly on chromosome 6, the elastic-net approach identified putative RA-related SNPs on other chromosomes in an increased proportion. For some of those putative RA-related SNPs, we identified the interactions with sex, a well known factor affecting RA susceptibility.

[1]  S. Cessie,et al.  Ridge Estimators in Logistic Regression , 1992 .

[2]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[3]  Wei Chen,et al.  Dissecting the genetic complexity of the association between human leukocyte antigens and rheumatoid arthritis. , 2002, American journal of human genetics.

[4]  A. Raz,et al.  Autocrine motility factor signaling induces tumor apoptotic resistance by regulations Apaf‐1 and Caspase‐9 apoptosome expression , 2003, International journal of cancer.

[5]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[6]  Steven J. Schrodi,et al.  A missense single-nucleotide polymorphism in a gene encoding a protein tyrosine phosphatase (PTPN22) is associated with rheumatoid arthritis. , 2004, American journal of human genetics.

[7]  Heping Zhang,et al.  A genome-wide tree- and forest-based association analysis of comorbidity of alcoholism and smoking , 2005, BMC Genetics.

[8]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[9]  Steven J. Schrodi,et al.  PTPN22 genetic variation: evidence for multiple variants associated with rheumatoid arthritis. , 2005, American journal of human genetics.

[10]  D. Reich,et al.  Principal components analysis corrects for stratification in genome-wide association studies , 2006, Nature Genetics.

[11]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[12]  Kentaro Tsuji-Naito,et al.  Aldehydic components of cinnamon bark extract suppresses RANKL-induced osteoclastogenesis through NFATc1 downregulation. , 2008, Bioorganic & medicinal chemistry.

[13]  Holger Schwender,et al.  Identification of SNP interactions using logic regression. , 2008, Biostatistics.

[14]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..