An Adaptive Genetic Association Test Using Double Kernel Machines

Recently, gene set-based approaches have become very popular in gene expression profiling studies for assessing how genetic variants are related to disease outcomes. Since most genes are not differentially expressed, existing pathway tests considering all genes within a pathway suffer from considerable noise and power loss. Moreover, for a differentially expressed pathway, it is of interest to select important genes that drive the effect of the pathway. In this article, we propose an adaptive association test using double kernel machines (DKM), which can both select important genes within the pathway as well as test for the overall genetic pathway effect. This DKM procedure first uses the garrote kernel machines test for the purposes of subset selection and then the least squares kernel machine test for testing the effect of the subset of genes. An appealing feature of the kernel machine framework is that it can provide a flexible and unified method for multi-dimensional modeling of the genetic pathway effect allowing for both parametric and nonparametric components. This DKM approach is illustrated with application to simulated data as well as to data from a neuroimaging genetics study.

[1]  D. Harville Maximum Likelihood Approaches to Variance Component Estimation and to Related Problems , 1977 .

[2]  Xihong Lin,et al.  A powerful and flexible multilocus association test for quantitative traits. , 2008, American journal of human genetics.

[3]  Xihong Lin,et al.  Powerful Tests for Detecting a Gene Effect in the Presence of Possible Gene–Gene Interactions Using Garrote Kernel Machines , 2011, Biometrics.

[4]  Wei Pan,et al.  Adaptive tests for association analysis of rare variants , 2011, Genetic epidemiology.

[5]  D. Nyholt A simple correction for multiple testing for single-nucleotide polymorphisms in linkage disequilibrium with each other. , 2004, American journal of human genetics.

[6]  Jianqing Fan Test of Significance Based on Wavelet Thresholding and Neyman's Truncation , 1996 .

[7]  D. Y. Lin,et al.  An efficient Monte Carlo approach to assessing statistical significance in genomic studies , 2005, Bioinform..

[8]  Martin D. Buhmann,et al.  Radial Basis Functions , 2021, Encyclopedia of Mathematical Geosciences.

[9]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[10]  Tianxi Cai,et al.  Kernel Machine Approach to Testing the Significance of Multiple Genetic Markers for Risk Prediction , 2011, Biometrics.

[11]  Michael G. Akritas,et al.  Order thresholding , 2008 .

[12]  W Y Zhang,et al.  Discussion on `Sure independence screening for ultra-high dimensional feature space' by Fan, J and Lv, J. , 2008 .

[13]  J. Neyman »Smooth test» for goodness of fit , 1937 .

[14]  Xihong Lin,et al.  Sparse linear discriminant analysis for simultaneous testing for the significance of a gene set/pathway and gene selection , 2009, Bioinform..

[15]  Deanne M. Taylor,et al.  Powerful SNP-set analysis for case-control genome-wide association studies. , 2010, American journal of human genetics.

[16]  Jianqing Fan,et al.  Sure independence screening for ultrahigh dimensional feature space , 2006, math/0612857.

[17]  Michael Weiner,et al.  Genome-wide analysis reveals novel genes influencing temporal lobe structure with relevance to neurodegeneration in Alzheimer's disease , 2010, NeuroImage.

[18]  Alex Smola,et al.  Kernel methods in machine learning , 2007, math/0701907.

[19]  Xihong Lin,et al.  Semiparametric Regression of Multidimensional Genetic Pathway Data: Least‐Squares Kernel Machines and Linear Mixed Models , 2007, Biometrics.

[20]  Dawei Liu,et al.  Estimation and testing for the effect of a genetic pathway on a disease outcome using logistic kernel machine regression via logistic mixed models , 2008, BMC Bioinformatics.

[21]  Tianxi Cai,et al.  Identifying genetic marker sets associated with phenotypes via an efficient adaptive score test. , 2012, Biostatistics.

[22]  Nello Cristianini,et al.  An introduction to Support Vector Machines , 2000 .

[23]  N. Schork,et al.  Generalized genomic distance-based regression methodology for multilocus association analysis. , 2006, American journal of human genetics.