Equivalence of Kernel Machine Regression and Kernel Distance Covariance for Multidimensional Trait Association Studies

Associating genetic markers with a multidimensional phenotype is an important yet challenging problem. In this work, we establish the equivalence between two popular methods: kernel-machine regression (KMR), and kernel distance covariance (KDC). KMR is a semiparametric regression frameworks that models the covariate effects parametrically, while the genetic markers are considered non-parametrically. KDC represents a class of methods that includes distance covariance (DC) and Hilbert-Schmidt Independence Criterion (HSIC), which are nonparametric tests of independence. We show the equivalence between the score test of KMR and the KDC statistic under certain conditions. This result leads to a novel generalization of the KDC test that incorporates the covariates. Our contributions are three-fold: (1) establishing the equivalence between KMR and KDC; (2) showing that the principles of kernel machine regression can be applied to the interpretation of KDC; (3) the development of a broader class of KDC statistics, that the members are the quantities of different kernels. We demonstrate the proposals using simulation studies. Data from the Alzheimer's Disease Neuroimaging Initiative (ADNI) is used to explore the association between the genetic variants on gene \emph{FLJ16124} and phenotypes represented in 3D structural brain MR images adjusting for age and gender. The results suggest that SNPs of \emph{FLJ16124} exhibit strong pairwise interaction effects that are correlated to the changes of brain region volumes.

[1]  S. Wood Thin plate regression splines , 2003 .

[2]  G. Wahba Spline models for observational data , 1990 .

[3]  R. Tibshirani,et al.  Generalized additive models for medical research , 1986, Statistical methods in medical research.

[4]  Le Song,et al.  A Hilbert Space Embedding for Distributions , 2007, Discovery Science.

[5]  Brian H. McArdle,et al.  FITTING MULTIVARIATE MODELS TO COMMUNITY DATA: A COMMENT ON DISTANCE‐BASED REDUNDANCY ANALYSIS , 2001 .

[6]  Michael Weiner,et al.  Whole genome association study of brain-wide imaging phenotypes for identifying quantitative trait loci in MCI and AD: A study of the ADNI cohort , 2010, NeuroImage.

[7]  Michael Weiner,et al.  Genome-wide analysis reveals novel genes influencing temporal lobe structure with relevance to neurodegeneration in Alzheimer's disease , 2010, NeuroImage.

[8]  Thomas E. Nichols,et al.  Discovering genetic associations with high-dimensional neuroimaging phenotypes: A sparse reduced-rank regression approach , 2010, NeuroImage.

[9]  Maria L. Rizzo,et al.  Measuring and testing dependence by correlation of distances , 2007, 0803.4101.

[10]  Arnab Maity,et al.  Kernel Machine SNP‐Set Testing Under Multiple Candidate Kernels , 2013, Genetic epidemiology.

[11]  Mark Jenkinson,et al.  Imaging dopamine receptors in humans with [11C]-(+)-PHNO: Dissection of D3 signal and anatomy , 2011, NeuroImage.

[12]  Le Song,et al.  A Kernel Statistical Test of Independence , 2007, NIPS.

[13]  Kenji Fukumizu,et al.  Equivalence of distance-based and RKHS-based statistics in hypothesis testing , 2012, ArXiv.

[14]  Bernhard Schölkopf,et al.  Hilbert Space Embeddings and Metrics on Probability Measures , 2009, J. Mach. Learn. Res..

[15]  Xihong Lin,et al.  Semiparametric Regression of Multidimensional Genetic Pathway Data: Least‐Squares Kernel Machines and Linear Mixed Models , 2007, Biometrics.

[16]  Andrew J. Saykin,et al.  Voxelwise genome-wide association study (vGWAS) , 2010, NeuroImage.

[17]  David S. Matteson,et al.  A Nonparametric Approach for Multiple Change Point Analysis of Multivariate Data , 2013, 1306.4933.

[18]  C. Jack,et al.  Alzheimer's Disease Neuroimaging Initiative , 2008 .

[19]  N. Schork,et al.  Generalized genomic distance-based regression methodology for multilocus association analysis. , 2006, American journal of human genetics.

[20]  Daniel Mathalon,et al.  A genome-wide association study of schizophrenia using brain activation as a quantitative phenotype. , 2009, Schizophrenia bulletin.

[21]  Maria L. Rizzo,et al.  Brownian distance covariance , 2009, 1010.0297.

[22]  Wei Pan,et al.  Relationship between genomic distance‐based regression and kernel machine regression for multi‐marker association testing , 2011, Genetic epidemiology.

[23]  Sivaraman Balakrishnan,et al.  Optimal kernel choice for large-scale two-sample tests , 2012, NIPS.

[24]  Arnab Maity,et al.  Multivariate Phenotype Association Analysis by Marker‐Set Kernel Machine Regression , 2012, Genetic epidemiology.

[25]  J. Gower Some distance properties of latent root and vector methods used in multivariate analysis , 1966 .

[26]  Bernhard Schölkopf,et al.  Kernel Methods for Measuring Independence , 2005, J. Mach. Learn. Res..

[27]  Xihong Lin,et al.  A powerful and flexible multilocus association test for quantitative traits. , 2008, American journal of human genetics.

[28]  Michael Weiner,et al.  Voxelwise gene-wide association study (vGeneWAS): Multivariate gene-based association testing in 731 elderly subjects , 2011, NeuroImage.

[29]  G. Székely,et al.  Extremal probabilities for Gaussian quadratic forms , 2003 .

[30]  Thomas E. Nichols,et al.  Multiple comparison procedures for neuroimaging genomewide association studies. , 2014, Biostatistics.

[31]  Paul M. Thompson,et al.  Increasing power for voxel-wise genome-wide association studies: The random field theory, least square kernel machines and fast permutation procedures , 2012, NeuroImage.

[32]  Alzheimer's Disease Neuroimaging Initiative,et al.  Genome-wide association with MRI atrophy measures as a quantitative trait locus for Alzheimer's disease , 2011, Molecular Psychiatry.

[33]  Bernhard Schölkopf,et al.  Measuring Statistical Dependence with Hilbert-Schmidt Norms , 2005, ALT.