Equivalence of kernel machine regression and kernel distance covariance for multidimensional phenotype association studies

Associating genetic markers with a multidimensional phenotype is an important yet challenging problem. In this work, we establish the equivalence between two popular methods: kernel-machine regression (KMR), and kernel distance covariance (KDC). KMR is a semiparametric regression framework that models covariate effects parametrically and genetic markers non-parametrically, while KDC represents a class of methods that include distance covariance (DC) and Hilbert-Schmidt independence criterion (HSIC), which are nonparametric tests of independence. We show that the equivalence between the score test of KMR and the KDC statistic under certain conditions can lead to a novel generalization of the KDC test that incorporates covariates. Our contributions are 3-fold: (1) establishing the equivalence between KMR and KDC; (2) showing that the principles of KMR can be applied to the interpretation of KDC; (3) the development of a broader class of KDC statistics, where the class members are statistics corresponding to different kernel combinations. Finally, we perform simulation studies and an analysis of real data from the Alzheimer's Disease Neuroimaging Initiative (ADNI) study. The ADNI study suggest that SNPs of FLJ16124 exhibit pairwise interaction effects that are strongly correlated to the changes of brain region volumes.

[1]  Le Song,et al.  A Hilbert Space Embedding for Distributions , 2007, Discovery Science.

[2]  Mark Jenkinson,et al.  Imaging dopamine receptors in humans with [11C]-(+)-PHNO: Dissection of D3 signal and anatomy , 2011, NeuroImage.

[3]  S. Dhanasekaran,et al.  Delineation of prognostic biomarkers in prostate cancer , 2001, Nature.

[4]  David S. Matteson,et al.  A Nonparametric Approach for Multiple Change Point Analysis of Multivariate Data , 2013, 1306.4933.

[5]  Maria L. Rizzo,et al.  Measuring and testing dependence by correlation of distances , 2007, 0803.4101.

[6]  Paul M. Thompson,et al.  Increasing power for voxel-wise genome-wide association studies: The random field theory, least square kernel machines and fast permutation procedures , 2012, NeuroImage.

[7]  Eva Petkova,et al.  Web-Based Supplementary Materials for “ On Distance-Based Permutation Tests for Between-Group Comparisons ” , 2009 .

[8]  J. Lieberman,et al.  Effectiveness of antipsychotic drugs in patients with chronic schizophrenia. , 2005, The New England journal of medicine.

[9]  Le Song,et al.  A Kernel Statistical Test of Independence , 2007, NIPS.

[10]  Brian H. McArdle,et al.  FITTING MULTIVARIATE MODELS TO COMMUNITY DATA: A COMMENT ON DISTANCE‐BASED REDUNDANCY ANALYSIS , 2001 .

[11]  Daniel Mathalon,et al.  A genome-wide association study of schizophrenia using brain activation as a quantitative phenotype. , 2009, Schizophrenia bulletin.

[12]  Thomas E. Nichols,et al.  Discovering genetic associations with high-dimensional neuroimaging phenotypes: A sparse reduced-rank regression approach , 2010, NeuroImage.

[13]  Alzheimer's Disease Neuroimaging Initiative,et al.  Genome-wide association with MRI atrophy measures as a quantitative trait locus for Alzheimer's disease , 2011, Molecular Psychiatry.

[14]  Maria L. Rizzo,et al.  Brownian distance covariance , 2009, 1010.0297.

[15]  Arnab Maity,et al.  Kernel Machine SNP‐Set Testing Under Multiple Candidate Kernels , 2013, Genetic epidemiology.

[16]  Xihong Lin,et al.  Semiparametric Regression of Multidimensional Genetic Pathway Data: Least‐Squares Kernel Machines and Linear Mixed Models , 2007, Biometrics.

[17]  Andrew J. Saykin,et al.  Voxelwise genome-wide association study (vGWAS) , 2010, NeuroImage.

[18]  G. Wahba Spline models for observational data , 1990 .

[19]  N. Schork,et al.  Generalized genomic distance-based regression methodology for multilocus association analysis. , 2006, American journal of human genetics.

[20]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[21]  Michael Weiner,et al.  Genome-wide analysis reveals novel genes influencing temporal lobe structure with relevance to neurodegeneration in Alzheimer's disease , 2010, NeuroImage.

[22]  Bernhard Schölkopf,et al.  Hilbert Space Embeddings and Metrics on Probability Measures , 2009, J. Mach. Learn. Res..

[23]  Bernhard Schölkopf,et al.  Kernel Methods for Measuring Independence , 2005, J. Mach. Learn. Res..

[24]  S. Wood Thin plate regression splines , 2003 .

[25]  Wei Pan,et al.  Relationship between genomic distance‐based regression and kernel machine regression for multi‐marker association testing , 2011, Genetic epidemiology.

[26]  Sivaraman Balakrishnan,et al.  Optimal kernel choice for large-scale two-sample tests , 2012, NIPS.

[27]  Arnab Maity,et al.  Multivariate Phenotype Association Analysis by Marker‐Set Kernel Machine Regression , 2012, Genetic epidemiology.

[28]  J. Gower Some distance properties of latent root and vector methods used in multivariate analysis , 1966 .

[29]  R. Tibshirani,et al.  Generalized additive models for medical research , 1986, Statistical methods in medical research.

[30]  Michael Weiner,et al.  Whole genome association study of brain-wide imaging phenotypes for identifying quantitative trait loci in MCI and AD: A study of the ADNI cohort , 2010, NeuroImage.

[31]  Xihong Lin,et al.  A powerful and flexible multilocus association test for quantitative traits. , 2008, American journal of human genetics.

[32]  Michael Weiner,et al.  Voxelwise gene-wide association study (vGeneWAS): Multivariate gene-based association testing in 731 elderly subjects , 2011, NeuroImage.

[33]  G. Székely,et al.  Extremal probabilities for Gaussian quadratic forms , 2003 .

[34]  Thomas E. Nichols,et al.  Multiple comparison procedures for neuroimaging genomewide association studies. , 2014, Biostatistics.

[35]  Kenji Fukumizu,et al.  Equivalence of distance-based and RKHS-based statistics in hypothesis testing , 2012, ArXiv.