Kernel canonical correlation analysis for assessing gene–gene interactions and application to ovarian cancer

Although single-locus approaches have been widely applied to identify disease-associated single-nucleotide polymorphisms (SNPs), complex diseases are thought to be the product of multiple interactions between loci. This has led to the recent development of statistical methods for detecting statistical interactions between two loci. Canonical correlation analysis (CCA) has previously been proposed to detect gene–gene coassociation. However, this approach is limited to detecting linear relations and can only be applied when the number of observations exceeds the number of SNPs in a gene. This limitation is particularly important for next-generation sequencing, which could yield a large number of novel variants on a limited number of subjects. To overcome these limitations, we propose an approach to detect gene–gene interactions on the basis of a kernelized version of CCA (KCCA). Our simulation studies showed that KCCA controls the Type-I error, and is more powerful than leading gene-based approaches under a disease model with negligible marginal effects. To demonstrate the utility of our approach, we also applied KCCA to assess interactions between 200 genes in the NF-κB pathway in relation to ovarian cancer risk in 3869 cases and 3276 controls. We identified 13 significant gene pairs relevant to ovarian cancer risk (local false discovery rate <0.05). Finally, we discuss the advantages of KCCA in gene–gene interaction analysis and its future role in genetic association studies.

[1]  M. Xiong,et al.  Test for interaction between two unlinked loci. , 2006, American journal of human genetics.

[2]  M. Spitz,et al.  Systematic evaluation of genetic variants in the inflammation pathway and risk of lung cancer. , 2007, Cancer research.

[3]  D. Hinkley,et al.  A Trimmed Jackknife , 1980 .

[4]  R. Vierkant,et al.  LIN 28 B Polymorphisms Influence Susceptibility to Epithelial Ovarian Cancer , 2011 .

[5]  R. Vierkant,et al.  Inherited Variants in Mitochondrial Biogenesis Genes May Influence Epithelial Ovarian Cancer Risk , 2011, Cancer Epidemiology, Biomarkers & Prevention.

[6]  Daniel J Schaid,et al.  Nonparametric tests of association of multiple genes with human disease. , 2005, American journal of human genetics.

[7]  Jonathan L Haines,et al.  Genetics, statistics and human disease: analytical retooling for complexity. , 2004, Trends in genetics : TIG.

[8]  Yoshihiro Yamanishi,et al.  Extraction of correlated gene clusters from multiple genomic data by generalized kernel canonical correlation analysis , 2003, ISMB.

[9]  D. Duggan,et al.  Recent developments in genomewide association scans: a workshop summary and review. , 2005, American journal of human genetics.

[10]  Judy H. Cho,et al.  Finding the missing heritability of complex diseases , 2009, Nature.

[11]  Daniela M Witten,et al.  Extensions of Sparse Canonical Correlation Analysis with Applications to Genomic Data , 2009, Statistical applications in genetics and molecular biology.

[12]  Daniel J Schaid,et al.  Genomic Similarity and Kernel Methods I: Advancements by Building on Mathematical and Statistical Foundations , 2010, Human Heredity.

[13]  D. Lawley,et al.  TESTS OF SIGNIFICANCE IN CANONICAL ANALYSIS , 1959 .

[14]  Giovanni Montana,et al.  HapSim: a simulation tool for generating haplotype data with pre-specified allele frequencies and LD coefficients , 2005, Bioinform..

[15]  R. Vierkant,et al.  Gene Set Analysis of Survival Following Ovarian Cancer Implicates Macrolide Binding and Intracellular Signaling Genes , 2012, Cancer Epidemiology, Biomarkers & Prevention.

[16]  R. Vierkant,et al.  LIN28B polymorphisms influence susceptibility to epithelial ovarian cancer. , 2011, Cancer research.

[17]  M. Perlman,et al.  Multivariate Detection of Gene‐Gene Interactions , 2012, Genetic epidemiology.

[18]  Qianqian Peng,et al.  A gene-based method for detecting gene–gene co-association in a case–control association study , 2010, European Journal of Human Genetics.

[19]  Jason H. Moore,et al.  The Ubiquitous Nature of Epistasis in Determining Susceptibility to Common Human Diseases , 2003, Human Heredity.

[20]  Brooke L. Fridley,et al.  Ovarian cancer risk associated with inherited inflammation-related variants. , 2012, Cancer research.

[21]  D. Freedman,et al.  Asymptotics of Graphical Projection Pursuit , 1984 .

[22]  Hidetoshi Murakami,et al.  Correlation analysis of principal components from two populations , 2007, Comput. Stat. Data Anal..

[23]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[24]  B. Efron Large-Scale Simultaneous Hypothesis Testing , 2004 .

[25]  Paul Scheet,et al.  A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. , 2006, American journal of human genetics.

[26]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[27]  G. Abecasis,et al.  MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes , 2010, Genetic epidemiology.

[28]  D. Reich,et al.  Principal components analysis corrects for stratification in genome-wide association studies , 2006, Nature Genetics.

[29]  Xihong Lin,et al.  Rare-variant association testing for sequencing data with the sequence kernel association test. , 2011, American journal of human genetics.

[30]  Deanne M. Taylor,et al.  Powerful SNP-set analysis for case-control genome-wide association studies. , 2010, American journal of human genetics.

[31]  Fengyu Zhang,et al.  An approach to incorporate linkage disequilibrium structure into genomic association analysis. , 2008, Journal of genetics and genomics = Yi chuan xue bao.

[32]  P. Sham,et al.  The future of association studies: gene-based analysis and replication. , 2004, American journal of human genetics.

[33]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[34]  D. Clayton,et al.  Genome-wide association studies: theoretical and practical concerns , 2005, Nature Reviews Genetics.

[35]  Aeilko H. Zwinderman,et al.  Sparse canonical correlation analysis for identifying, connecting and completing gene-expression networks , 2009, BMC Bioinformatics.

[36]  Yuehua Cui,et al.  Gene-centric gene–gene interaction: A model-based kernel machine method , 2012, 1209.6502.