Meta-Analysis of SNP-Environment Interaction With Overlapping Data

Meta-analysis, which combines the results of multiple studies, is an important analytical method in genome-wide association studies. In genome-wide association studies practice, studies employing meta-analysis may have overlapping data, which could yield false positive results. Recent studies have proposed models to handle the issue of overlapping data when testing the genetic main effect of single nucleotide polymorphism. However, there is still no meta-analysis method for testing gene-environment interaction when overlapping data exist. Inspired by the methods of testing the main effect of gene with overlapping data, we proposed an overlapping meta-regulation method to address the issue in testing the gene-environment interaction. We generalized the covariance matrices of the regular meta-regression model by employing Lin’s and Han’s correlation structures to incorporate the correlations introduced by the overlapping data. Based on our proposed models, we further provided statistical significance tests of the gene-environment interaction as well as joint effects of the gene main effect and the interaction. Through simulations, we examined type I errors and statistical powers of our proposed methods at different levels of data overlap among studies. We demonstrated that our method well controls the type I error and simultaneously achieves statistical power comparable with the method that removes overlapping samples a priori before the meta-analysis, i.e., the splitting method. On the other hand, ignoring overlapping data will inflate the type I error. Unlike the splitting method that requires individual-level genotype and phenotype data, our proposed method for testing gene-environment interaction handles the issue of overlapping data effectively and statistically efficiently at the meta-analysis level.

[1]  A. Nehorai,et al.  Meta-Regression of Gene-Environment Interaction in Genome-Wide Association Studies , 2013, IEEE Transactions on NanoBioscience.

[2]  A. Avan,et al.  Interaction between a variant of CDKN2A/B-gene with lifestyle factors in determining dyslipidemia and estimated cardiovascular risk: A step toward personalized nutrition. , 2016, Clinical nutrition.

[3]  Teri A Manolio,et al.  Genomewide association studies and assessment of the risk of disease. , 2010, The New England journal of medicine.

[4]  Xiaoquan Wen,et al.  Bayesian model selection in complex linear systems, as illustrated in genetic association studies , 2013, Biometrics.

[5]  Dan-Yu Lin,et al.  Meta-analysis of genome-wide association studies with overlapping subjects. , 2009, American journal of human genetics.

[6]  Arye Nehorai,et al.  Robustness of meta-analyses in finding gene × environment interactions , 2017, PloS one.

[7]  Peter Kraft,et al.  GWAS identifies a common breast cancer risk allele among BRCA1 carriers , 2010, Nature Genetics.

[8]  A. Morris,et al.  Transethnic Meta-Analysis of Genomewide Association Studies , 2011, Genetic epidemiology.

[9]  Dmitri V Zaykin,et al.  P‐value based analysis for shared controls design in genome‐wide association studies , 2010, Genetic epidemiology.

[10]  Nilanjan Chatterjee,et al.  A subset-based approach improves power and interpretation for the combined analysis of genetic association studies of heterogeneous traits. , 2012, American journal of human genetics.

[11]  John P A Ioannidis,et al.  Meta-analysis in genome-wide association studies. , 2009, Pharmacogenomics.

[12]  Mitchell H. Gail,et al.  On Combining Data From Genome-Wide Association Studies to Discover Disease-Associated SNPs , 2009, 1010.5046.

[13]  Guoqiang Han,et al.  HOGMMNC: a higher order graph matching with multiple network constraints model for gene‐drug regulatory modules identification , 2018, Bioinform..

[14]  Matthew Stephens,et al.  BAYESIAN METHODS FOR GENETIC ASSOCIATION ANALYSIS WITH HETEROGENEOUS SUBGROUPS: FROM META-ANALYSES TO GENE-ENVIRONMENT INTERACTIONS. , 2011, The annals of applied statistics.

[15]  John P A Ioannidis,et al.  Discovery properties of genome-wide association signals from cumulatively combined data sets. , 2009, American journal of epidemiology.

[16]  J. Fleiss,et al.  The statistical basis of meta-analysis. , 1993, Statistical methods in medical research.

[17]  Guoqiang Han,et al.  Identification of Multidimensional Regulatory Modules Through Multi-Graph Matching With Network Constraints , 2020, IEEE Transactions on Biomedical Engineering.

[18]  Identifi cation of additional risk loci for stroke and small vessel disease : a meta-analysis of genome-wide association , 2016 .

[19]  Gang Shi,et al.  Meta-Analysis of SNP-Environment Interaction with Heterogeneity , 2019, Human Heredity.

[20]  J. Ioannidis,et al.  Meta-analysis methods for genome-wide association studies and beyond , 2013, Nature Reviews Genetics.

[21]  M. S. Patel,et al.  An introduction to meta-analysis. , 1989, Health Policy.

[22]  Betsy Jane Becker,et al.  The Synthesis of Regression Slopes in Meta-Analysis. , 2007, 0801.4442.

[23]  Josée Dupuis,et al.  Meta‐analysis of gene‐environment interaction: joint estimation of SNP and SNP × environment regression coefficients , 2011, Genetic epidemiology.

[24]  Eleazar Eskin,et al.  A general framework for meta-analyzing dependent studies with overlapping subjects in association mapping. , 2013, Human molecular genetics.

[25]  J. Fleiss Review papers : The statistical basis of meta-analysis , 1993 .

[26]  Peter Kraft,et al.  Exploiting Gene-Environment Interaction to Detect Genetic Associations , 2007, Human Heredity.

[27]  Eleazar Eskin,et al.  Random-effects model aimed at discovering associations in meta-analysis of genome-wide association studies. , 2011, American journal of human genetics.