A Unified Sparse Representation for Sequence Variant Identification for Complex Traits

Joint adjustment of cryptic relatedness and population structure is necessary to reduce bias in DNA sequence analysis; however, existent sparse regression methods model these two confounders separately. Incorporating prior biological information has great potential to enhance statistical power but such information is often overlooked in many existent sparse regression models. We developed a unified sparse regression (USR) to incorporate prior information and jointly adjust for cryptic relatedness, population structure, and other environmental covariates. Our USR models cryptic relatedness as a random effect and population structure as fixed effect, and utilize the weighted penalties to incorporate prior knowledge. As demonstrated by extensive simulations, our USR algorithm can discover more true causal variants and maintain a lower false discovery rate than do several commonly used feature selection methods. It can handle both rare and common variants simultaneously. Applying our USR algorithm to DNA sequence data of Mexican Americans from GAW18, we replicated three hypertension pathways, demonstrating the effectiveness in identifying susceptibility genetic variants.

[1]  E A Thompson,et al.  Pedigree analysis for quantitative traits: variance components without matrix inversion. , 1990, Biometrics.

[2]  R. Guymer,et al.  Benign intracranial hypertension in chronic myeloid leukemia. , 1993, Australian and New Zealand journal of ophthalmology.

[3]  Balas K. Natarajan,et al.  Sparse Approximate Solutions to Linear Systems , 1995, SIAM J. Comput..

[4]  F. Valtorta,et al.  Hypertension-associated point mutations in the adducin alpha and beta subunits affect actin cytoskeleton and ion transport. , 1996, The Journal of clinical investigation.

[5]  Bhaskar D. Rao,et al.  An affine scaling methodology for best basis selection , 1999, IEEE Trans. Signal Process..

[6]  Aydın Erar,et al.  VARIABLE SELECTION WITH AKAIKE INFORMATION CRITERIA : A COMPARATIVE STUDY , 2001 .

[7]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[8]  J. Coebergh,et al.  Hypertension as a risk factor for glioma? Evidence from a population-based study of comorbidity in glioma patients. , 2004, Annals of oncology : official journal of the European Society for Medical Oncology.

[9]  Emmanuel J. Candès,et al.  Decoding by linear programming , 2005, IEEE Transactions on Information Theory.

[10]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[11]  R. Tibshirani,et al.  PATHWISE COORDINATE OPTIMIZATION , 2007, 0708.1485.

[12]  N. Meinshausen,et al.  Stability selection , 2008, 0809.2932.

[13]  D. Heckerman,et al.  Efficient Control of Population Structure in Model Organism Association Mapping , 2008, Genetics.

[14]  M. Groenen,et al.  Regional differences in recombination hotspots between two chicken populations , 2010, BMC Genetics.

[15]  Xiaojun Chen,et al.  Smoothing Projected Gradient Method and Its Application to Stochastic Linear Complementarity Problems , 2009, SIAM J. Optim..

[16]  Trevor J. Hastie,et al.  Genome-wide association analysis by lasso penalized logistic regression , 2009, Bioinform..

[17]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[18]  Y. Ye,et al.  Lower Bound Theory of Nonzero Entries in Solutions of ℓ2-ℓp Minimization , 2010, SIAM J. Sci. Comput..

[19]  Alkes L. Price,et al.  New approaches to population stratification in genome-wide association studies , 2010, Nature Reviews Genetics.

[20]  Adele Cutler,et al.  An application of Random Forests to a genome-wide association dataset: Methodological considerations & new findings , 2010, BMC Genetics.

[21]  Taesung Park,et al.  Joint Identification of Multiple Genetic Variants via Elastic‐Net Variable Selection in a Genome‐Wide Association Analysis , 2010, Annals of human genetics.

[22]  Hua Zhou,et al.  Association screening of common and rare genetic variants by penalized regression , 2010, Bioinform..

[23]  Xiaofeng Zhu,et al.  Interrogating local population structure for fine mapping in genome-wide association studies , 2010, Bioinform..

[24]  Tso-Jung Yen,et al.  Discussion on "Stability Selection" by Meinshausen and Buhlmann , 2010 .

[25]  N. Yi,et al.  Hierarchical Generalized Linear Models for Multiple Groups of Rare and Common Variants: Jointly Estimating Group and Individual-Variant Effects , 2011, PLoS genetics.

[26]  S. Rosenkranz,et al.  Fully reversible pulmonary arterial hypertension associated with dasatinib treatment for chronic myeloid leukaemia , 2011, European Respiratory Journal.

[27]  Xihong Lin,et al.  Rare-variant association testing for sequencing data with the sequence kernel association test. , 2011, American journal of human genetics.

[28]  R. Hui,et al.  Depression increases the risk of hypertension incidence: a meta-analysis of prospective cohort studies , 2012, Journal of hypertension.

[29]  M. Stephens,et al.  Genome-wide Efficient Mixed Model Analysis for Association Studies , 2012, Nature Genetics.

[30]  Zongben Xu,et al.  Representative of L1/2 Regularization among Lq (0 < q ≤ 1) Regularizations: an Experimental Study Based on Phase Diagram , 2012 .

[31]  G. McVean,et al.  Differential confounding of rare and common variants in spatially structured populations , 2011, Nature Genetics.

[32]  X. Chen,et al.  Random forests for genomic data analysis. , 2012, Genomics.

[33]  Hua Tang,et al.  Estimating kinship in admixed populations. , 2012, American journal of human genetics.

[34]  Zongben Xu,et al.  $L_{1/2}$ Regularization: A Thresholding Representation Theory and a Fast Solver , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[35]  R. Sacco,et al.  Follow-up association study of linkage regions reveals multiple candidate genes for carotid plaque in Dominicans. , 2012, Atherosclerosis.

[36]  Xihong Lin,et al.  Optimal tests for rare variant effects in sequencing association studies. , 2012, Biostatistics.

[37]  Hong-xuan Wang,et al.  Prehypertension is associated with increased carotid atherosclerotic plaque in the community population of Southern China , 2013, BMC Cardiovascular Disorders.

[38]  J. Larrick,et al.  Dietary modification of the microbiome affects risk for cardiovascular disease. , 2013, Rejuvenation research.

[39]  Naifeng Liu,et al.  Advanced glycation end products accelerate rat vascular calcification through RAGE/oxidative stress , 2013, BMC Cardiovascular Disorders.