Inferring gene-environment interaction from case-parent trio data: evaluation of and adjustment for spurious GxE and development of a data-smoothing method to uncover true GxE

Most complex diseases are influenced jointly by genes (G) and environmental or nongenetic attributes (E). Gene-environment interaction (G×E) is measured by statistical interaction between G and E, which occurs when genotype relative risks (GRRs) vary with E. In this thesis, we explore the sources of spurious G×E and propose a data-smoothing approach to G×E for case-parent trio data. In the first project, we address the problem of making inference about G×E based on the transmission rates of alleles from parents to affected offspring. Since GRRs that vary with E lead to transmission rates that do too, transmission rates have been used to make inference about G×E. However transmission-based tests of G×E are found to be invalid in general. To understand the bias of the transmission-based test, we derive theoretical transmission rates and compare their variation with E to that in the GRRs. Through simulation, we investigate the practical implication of the bias. Valid approaches that are not based on transmission rates require specifying or are designed to work well under a parametric form for G×E. In the second project, we develop a data-smoothing method to explore G×E that does not require model specification for the interaction component when we work with genotypes for a causal marker. The data-driven method produces graphical displays of G×E that suggest its form. For testing significance of G×E, we take a permutation approach to account for the additional uncertainty introduced by the smoothing process. For many approaches to inference of G×E with case-parent trio data, including our own, a key assumption is that the test marker is causal; however, in reality, it may not be causal but in linkage disequilibrium with a causal locus. In this case, the approaches can give a false impression of G×E due to a form of population

[1]  N E Day,et al.  The design of case-control studies: the influence of confounding and interaction effects. , 1984, International journal of epidemiology.

[2]  Holly M. Mortensen,et al.  Convergent adaptation of human lactase persistence in Africa and Europe , 2007, Nature Genetics.

[3]  Y. Ko,et al.  The synergistic effects of the IL‐9 gene and environmental exposures on asthmatic Taiwanese families as determined by the transmission/disequilibrium test , 2006, International journal of immunogenetics.

[4]  L. Jin,et al.  Ethnic-affiliation estimation by use of population-specific DNA markers. , 1997, American journal of human genetics.

[5]  Alan Y. Chiang,et al.  Generalized Additive Models: An Introduction With R , 2007, Technometrics.

[6]  H. Cordell Epistasis: what it means, what it doesn't mean, and statistical methods to detect it in humans. , 2002, Human molecular genetics.

[7]  H. Cordell Estimation and testing of gene-environment interactions in family-based association studies. , 2009, Genomics.

[8]  David M Umbach,et al.  Family-based Gene-by-environment Interaction Studies: Revelations and Remedies , 2011, Epidemiology.

[9]  P. Yang,et al.  Genetic polymorphism of epoxide hydrolase and glutathione S-transferase in COPD , 2004, European Respiratory Journal.

[10]  A graphical tool for exploring SNP-by-environment interaction in case-parent trios , 2007 .

[11]  J. Thoday Population Genetics , 1956, Nature.

[12]  L. Ries,et al.  Cancer incidence and survival among children and adolescents: United States SEER Program 1975-1995. , 1999 .

[13]  A. Di Rienzo,et al.  Complex signatures of natural selection at the Duffy blood group locus. , 2002, American journal of human genetics.

[14]  M. Loriot,et al.  Childhood leukaemia, polymorphisms of metabolism enzyme genes, and interactions with maternal tobacco, coffee and alcohol consumption during pregnancy , 2005, European journal of cancer prevention : the official journal of the European Cancer Prevention Organisation.

[15]  Adjusting for Spurious Gene-by-Environment Interaction Using Case-Parent Triads , 2012, Statistical applications in genetics and molecular biology.

[16]  Mu Zhu,et al.  Automatic dimensionality selection from the scree plot via the use of profile likelihood , 2006, Comput. Stat. Data Anal..

[17]  D. Hémon,et al.  Family cancer history and risk of childhood acute leukemia (France) , 2001, Cancer Causes & Control.

[18]  Allen T. Craig,et al.  Introduction to Mathematical Statistics (6th Edition) , 2005 .

[19]  K. Liang On information and ancillarity in the presence of a nuisance parameter , 1983 .

[20]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[21]  D. Clayton Prediction and Interaction in Complex Disease Genetics: Experience in Type 1 Diabetes , 2009, PLoS genetics.

[22]  I. Fortier,et al.  Markers of infection, breast-feeding and childhood acute lymphoblastic leukaemia , 2000, British Journal of Cancer.

[23]  M F Greaves,et al.  A lack of a functional NAD(P)H:quinone oxidoreductase allele is selectively associated with pediatric leukemias that have MLL fusions. United Kingdom Childhood Cancer Study Investigators. , 1999, Cancer research.

[24]  Taane G Clark,et al.  Genome-wide comparisons of variation in linkage disequilibrium. , 2009, Genome research.

[25]  C. Greenwood,et al.  Continuous Covariates in Genetic Association Studies of Case-Parent Triads: Gene and Gene-Environment Interaction Effects, Population Stratification, and Power Analysis , 2005, Statistical applications in genetics and molecular biology.

[26]  C. Weinberg,et al.  Excess transmission of the NAD(P)H:quinone oxidoreductase 1 (NQO1) C609T polymorphism in families of children with acute lymphoblastic leukemia. , 2007, American journal of epidemiology.

[27]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[28]  B. Franke,et al.  Differential dopamine receptor D4 allele association with ADHD dependent of proband season of birth , 2008, American journal of medical genetics. Part B, Neuropsychiatric genetics : the official publication of the International Society of Psychiatric Genetics.

[29]  DAVID G. KENDALL,et al.  Introduction to Mathematical Statistics , 1947, Nature.

[30]  M. Gill,et al.  Impaired Temporal Resolution of Visual Attention and Dopamine Beta Hydroxylase Genotype in Attention-Deficit/Hyperactivity Disorder , 2006, Biological Psychiatry.

[31]  R. Tisch,et al.  Insulin-Dependent Diabetes Mellitus , 1996, Cell.

[32]  Douglas Nychka,et al.  Bayesian Confidence Intervals for Smoothing Splines , 1988 .

[33]  J. Graham,et al.  On the Use of Allelic Transmission Rates for Assessing Gene‐by‐Environment Interaction in Case‐Parent Trios , 2010, Annals of human genetics.

[34]  Heather J Cordell,et al.  Case/pseudocontrol analysis in genetic association studies: A unified framework for detection of genotype and haplotype associations, gene‐gene and gene‐environment interactions, and parent‐of‐origin effects , 2004, Genetic epidemiology.

[35]  E. Niemitz Parent-of-origin effects , 2014, Nature Genetics.

[36]  Hugh G. Gauch,et al.  Statistical Analysis of Yield Trials by AMMI and GGE , 2006 .

[37]  Jenny Chang-Claude,et al.  Gene–environment interactions for complex traits: definitions, methodological requirements and challenges , 2008, European Journal of Human Genetics.

[38]  N M Laird,et al.  Tests of Gene‐Environment Interaction for Case‐Parent Triads with General Environmental Exposures , 2004, Annals of human genetics.

[39]  S. Wood,et al.  Coverage Properties of Confidence Intervals for Generalized Additive Model Components , 2012 .

[40]  S. Vansteelandt,et al.  A doubly robust test for gene-environment interaction in family-based studies of affected offspring. , 2010, Biostatistics.

[41]  W. Ewens,et al.  Transmission test for linkage disequilibrium: the insulin gene region and insulin-dependent diabetes mellitus (IDDM). , 1993, American journal of human genetics.

[42]  Pardis C Sabeti,et al.  Genetic signatures of strong recent positive selection at the lactase gene. , 2004, American journal of human genetics.

[43]  C R Weinberg,et al.  The use of case-parent triads to study joint effects of genotype and exposure. , 2000, American journal of human genetics.

[44]  S. Duan,et al.  Comprehensive analysis of polymorphisms throughout GAD1 gene: a family-based association study in schizophrenia , 2008, Journal of Neural Transmission.

[45]  C. Boitard,et al.  Age-dependent HLA genetic heterogeneity of type 1 insulin-dependent diabetes mellitus. , 1992, The Journal of clinical investigation.

[46]  D J Schaid,et al.  Case‐parents design for gene‐environment interaction , 1999, Genetic epidemiology.

[47]  C. Infante-Rivard Hospital or population controls for case-control studies of severe childhood diseases? , 2003, American journal of epidemiology.

[48]  C. Wild,et al.  Vector Generalized Additive Models , 1996 .

[49]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[50]  Kyoko Shibata,et al.  Genetic flip-flop without an accompanying change in linkage disequilibrium. , 2008, American journal of human genetics.

[51]  S. Wood,et al.  GAMs with integrated model selection using penalized regression splines and applications to environmental modelling , 2002 .

[52]  Shemin Lu,et al.  Evidence for transmission disequilibrium at the DAOA gene locus in a schizophrenia family sample , 2009, Neuroscience Letters.