Multilocus Association Testing of Quantitative Traits Based on Partial Least-Squares Analysis

Because of combining the genetic information of multiple loci, multilocus association studies (MLAS) are expected to be more powerful than single locus association studies (SLAS) in disease genes mapping. However, some researchers found that MLAS had similar or reduced power relative to SLAS, which was partly attributed to the increased degrees of freedom (dfs) in MLAS. Based on partial least-squares (PLS) analysis, we develop a MLAS approach, while avoiding large dfs in MLAS. In this approach, genotypes are first decomposed into the PLS components that not only capture majority of the genetic information of multiple loci, but also are relevant for target traits. The extracted PLS components are then regressed on target traits to detect association under multilinear regression. Simulation study based on real data from the HapMap project were used to assess the performance of our PLS-based MLAS as well as other popular multilinear regression-based MLAS approaches under various scenarios, considering genetic effects and linkage disequilibrium structure of candidate genetic regions. Using PLS-based MLAS approach, we conducted a genome-wide MLAS of lean body mass, and compared it with our previous genome-wide SLAS of lean body mass. Simulations and real data analyses results support the improved power of our PLS-based MLAS in disease genes mapping relative to other three MLAS approaches investigated in this study. We aim to provide an effective and powerful MLAS approach, which may help to overcome the limitations of SLAS in disease genes mapping.

[1]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[2]  I. Helland Partial least squares regression and statistical models , 1990 .

[3]  Kathryn Roeder,et al.  Analysis of single‐locus tests to detect gene/disease associations , 2005, Genetic epidemiology.

[4]  Philippe Froguel,et al.  Genome-wide association scans identified CTNNBL1 as a novel gene for obesity. , 2008, Human molecular genetics.

[5]  Jukka Corander,et al.  Efficient Bayesian approach for multilocus association mapping including gene-gene interactions , 2010, BMC Bioinformatics.

[6]  Ron Wehrens,et al.  The pls Package: Principal Component and Partial Least Squares Regression in R , 2007 .

[7]  R. Tibshirani,et al.  A tail strength measure for assessing the overall univariate significance in a dataset. , 2005, Biostatistics.

[8]  Tao Wang,et al.  A partial least‐square approach for modeling gene‐gene and gene‐environment interactions when multiple markers are genotyped , 2009, Genetic epidemiology.

[9]  Wei Pan,et al.  Asymptotic tests of association with multiple SNPs in linkage disequilibrium , 2009, Genetic epidemiology.

[10]  J. Li,et al.  Adjusting multiple testing in multilocus analyses using the eigenvalues of a correlation matrix , 2005, Heredity.

[11]  S. Gabriel,et al.  Efficiency and power in genetic association studies , 2005, Nature Genetics.

[12]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[13]  Tie-Lin Yang,et al.  Pathway-Based Genome-Wide Association Analysis Identified the Importance of Regulation-of-Autophagy Pathway for Ultradistal Radius BMD , 2010, Journal of bone and mineral research : the official journal of the American Society for Bone and Mineral Research.

[14]  N. Schork,et al.  Genetics of complex disease: approaches, problems, and solutions. , 1997, American journal of respiratory and critical care medicine.

[15]  Hong-Wen Deng,et al.  Biological Pathway‐Based Genome‐Wide Association Analysis Identified the Vasoactive Intestinal Peptide (VIP) Pathway Important for Obesity , 2010, Obesity.

[16]  A Hofman,et al.  Bone mineral density, osteoporosis, and osteoporotic fractures: a genome-wide association study , 2008, The Lancet.

[17]  Daniel J Schaid,et al.  Nonparametric tests of association of multiple genes with human disease. , 2005, American journal of human genetics.

[18]  ADAMTS-1: A cellular disintegrin and metalloprotease with thrombospondin motifs is a target for parathyroid hormone in bone. , 2000, Endocrinology.

[19]  Fengzhu Sun,et al.  A model-based approach to selection of tag SNPs , 2006, BMC Bioinformatics.

[20]  Joseph T. Glessner,et al.  A genome-wide association study identifies KIAA0350 as a type 1 diabetes gene , 2007, Nature.

[21]  N. Chatterjee,et al.  Powerful multilocus tests of genetic association in the presence of gene-gene and gene-environment interactions. , 2006, American journal of human genetics.

[22]  S. Levy,et al.  Genome-wide association and replication studies identified TRHR as an important gene for lean body mass. , 2009, American journal of human genetics.

[23]  Juliet M Chapman,et al.  Detecting Disease Associations due to Linkage Disequilibrium Using Haplotype Tags: A Class of Tests and the Determinants of Statistical Power , 2003, Human Heredity.

[24]  P. Donnelly,et al.  Designing Genome-Wide Association Studies: Sample Size, Power, Imputation, and the Choice of Genotyping Chip , 2009, PLoS genetics.

[25]  Philip S Rosenberg,et al.  Multiple hypothesis testing strategies for genetic case–control association studies , 2006, Statistics in medicine.

[26]  John D. Storey The positive false discovery rate: a Bayesian interpretation and the q-value , 2003 .

[27]  R. Bjerkvig,et al.  Distribution patterns of the anti-angiogenic protein ADAMTS-1 during rat development. , 2005, Acta histochemica.

[28]  Kai Wang,et al.  A principal components regression approach to multilocus genetic association studies , 2008, Genetic epidemiology.

[29]  John D. Storey,et al.  Statistical significance for genomewide studies , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[30]  N. Schork,et al.  Generalized genomic distance-based regression methodology for multilocus association analysis. , 2006, American journal of human genetics.

[31]  J. Choi,et al.  Associations of serum TRAIL concentrations, anthropometric variables, and serum lipid parameters in healthy adults. , 2004, Annals of clinical and laboratory science.

[32]  T. Spector,et al.  Identification of PLCL1 Gene for Hip Bone Size Variation in Females in a Genome-Wide Association Study , 2008, PloS one.

[33]  Maizah Hura Ahmad,et al.  A Comparative Study On Some Methods For Handling Multicollinearity Problems , 2006 .

[34]  I. Helland ON THE STRUCTURE OF PARTIAL LEAST SQUARES REGRESSION , 1988 .

[35]  Fengzhu Sun,et al.  Assessing the power of tag SNPs in the mapping of quantitative trait loci (QTL) with extremal and random samples , 2005, BMC Genetics.

[36]  M. Freer,et al.  Signaling through the TRAIL receptor DR5/FADD pathway plays a role in the apoptosis associated with skeletal myoblast differentiation , 2006, Apoptosis.

[37]  D. Nyholt A simple correction for multiple testing for single-nucleotide polymorphisms in linkage disequilibrium with each other. , 2004, American journal of human genetics.

[38]  Tao Wang,et al.  Improved power by use of a weighted score test for linkage disequilibrium mapping. , 2007, American journal of human genetics.

[39]  Ozgur Yeniay,et al.  A comparison of partial least squares regression with other prediction methods , 2001 .

[40]  John D. Storey A direct approach to false discovery rates , 2002 .

[41]  Robert Tibshirani,et al.  Statistical Significance for Genome-Wide Experiments , 2003 .

[42]  David V Conti,et al.  Testing association between disease and multiple SNPs in a candidate gene , 2007, Genetic epidemiology.

[43]  P. Donnelly,et al.  A new multipoint method for genome-wide association studies by imputation of genotypes , 2007, Nature Genetics.