Relationship between genomic distance‐based regression and kernel machine regression for multi‐marker association testing

To detect genetic association with common and complex diseases, two powerful yet quite different multimarker association tests have been proposed, genomic distance‐based regression (GDBR) (Wessel and Schork [2006] Am J Hum Genet 79:821–833) and kernel machine regression (KMR) (Kwee et al. [2008] Am J Hum Genet 82:386–397; Wu et al. [2010] Am J Hum Genet 86:929–942). GDBR is based on relating a multimarker similarity metric for a group of subjects to variation in their trait values, while KMR is based on nonparametric estimates of the effects of the multiple markers on the trait through a kernel function or kernel matrix. Since the two approaches are both powerful and general, but appear quite different, it is important to know their specific relationships. In this report, we show that, under the condition that there is no other covariate, there is a striking correspondence between the two approaches for a quantitative or a binary trait: if the same positive semi‐definite matrix is used as the centered similarity matrix in GDBR and as the kernel matrix in KMR, the F‐test statistic in GDBR and the score test statistic in KMR are equal (up to some ignorable constants). The result is based on the connections of both methods to linear or logistic (random‐effects) regression models. Genet. Epidemiol 35: 211‐216, 2011   © 2011 Wiley‐Liss, Inc.

[1]  Qiuying Sha,et al.  A new association test using haplotype similarity , 2007, Genetic epidemiology.

[2]  Dawei Liu,et al.  Estimation and testing for the effect of a genetic pathway on a disease outcome using logistic kernel machine regression via logistic mixed models , 2008, BMC Bioinformatics.

[3]  N. L. Johnson,et al.  Multivariate Analysis , 1958, Nature.

[4]  Daniel J Schaid,et al.  Genomic Similarity and Kernel Methods I: Advancements by Building on Mathematical and Statistical Foundations , 2010, Human Heredity.

[5]  Mingyao Li,et al.  U‐Statistics‐based Tests for Multiple Genes in Genetic Association Studies , 2008, Annals of human genetics.

[6]  Wei Pan,et al.  Powerful multi‐marker association tests: unifying genomic distance‐based regression and logistic regression , 2010, Genetic epidemiology.

[7]  Jung-Ying Tzeng,et al.  Haplotype-based association analysis via variance-components score test. , 2007, American journal of human genetics.

[8]  J. Magnus,et al.  Matrix Differential Calculus with Applications in Statistics and Econometrics , 1991 .

[9]  Qizhai Li,et al.  Genetic background comparison using distance‐based regression, with applications in population stratification evaluation and adjustment , 2009, Genetic epidemiology.

[10]  G. Wahba,et al.  Some results on Tchebycheffian spline functions , 1971 .

[11]  B. Maher Personal genomes: The case of the missing heritability , 2008, Nature.

[12]  Ao Yuan,et al.  Detecting disease gene in DNA haplotype sequences by nonparametric dissimilarity test , 2006, Human Genetics.

[13]  M. Daly,et al.  Genetic Mapping in Human Disease , 2008, Science.

[14]  Deanne M. Taylor,et al.  Powerful SNP-set analysis for case-control genome-wide association studies. , 2010, American journal of human genetics.

[15]  Matthew A. Zapala,et al.  Multivariate regression analysis of distance matrices for testing associations between gene expression patterns and related variables , 2006, Proceedings of the National Academy of Sciences.

[16]  Daniel J Schaid,et al.  Nonparametric tests of association of multiple genes with human disease. , 2005, American journal of human genetics.

[17]  L. Wasserman,et al.  On the identification of disease mutations by the analysis of haplotype similarity and goodness of fit. , 2003, American journal of human genetics.

[18]  Xihong Lin,et al.  A powerful and flexible multilocus association test for quantitative traits. , 2008, American journal of human genetics.

[19]  N. Schork,et al.  Generalized genomic distance-based regression methodology for multilocus association analysis. , 2006, American journal of human genetics.

[20]  Lambertus Klei,et al.  Testing for association based on excess allele sharing in a sample of related cases and controls , 2007, Human Genetics.

[21]  Sara van de Geer,et al.  Testing against a high dimensional alternative , 2006 .

[22]  Wei Pan,et al.  Asymptotic tests of association with multiple SNPs in linkage disequilibrium , 2009, Genetic epidemiology.

[23]  Jung-Ying Tzeng,et al.  Gene‐Trait Similarity Regression for Multimarker‐Based Association Analysis , 2009, Biometrics.

[24]  Brian H. McArdle,et al.  FITTING MULTIVARIATE MODELS TO COMMUNITY DATA: A COMMENT ON DISTANCE‐BASED REDUNDANCY ANALYSIS , 2001 .

[25]  Tao Wang,et al.  Improved power by use of a weighted score test for linkage disequilibrium mapping. , 2007, American journal of human genetics.

[26]  Daniel J Schaid,et al.  Power comparisons between similarity‐based multilocus association methods, logistic regression, and score tests for haplotypes , 2009, Genetic epidemiology.

[27]  Jason Cooper,et al.  Use of unphased multilocus genotype data in indirect association studies , 2004, Genetic epidemiology.

[28]  Daniel J Schaid,et al.  Genomic Similarity and Kernel Methods II: Methods for Genomic Information , 2010, Human Heredity.

[29]  Larry Wasserman,et al.  Outlier Detection and False Discovery Rates for Whole-Genome DNA Matching , 2003 .

[30]  Anbupalam Thalamuthu,et al.  Association tests using kernel‐based measures of multi‐locus genotype similarity between individuals , 2009, Genetic epidemiology.