Longitudinal analysis strategies for modelling epigenetic trajectories

Abstract Background DNA methylation levels are known to vary over time, and modelling these trajectories is crucial for our understanding of the biological relevance of these changes over time. However, due to the computational cost of fitting multilevel models across the epigenome, most trajectory modelling efforts to date have focused on a subset of CpG sites identified through epigenome-wide association studies (EWAS) at individual time-points. Methods We propose using linear regression across the repeated measures, estimating cluster-robust standard errors using a sandwich estimator, as a less computationally intensive strategy than multilevel modelling. We compared these two longitudinal approaches, as well as three approaches based on EWAS (associated at baseline, at any time-point and at all time-points), for identifying epigenetic change over time related to an exposure using simulations and by applying them to blood DNA methylation profiles from the Accessible Resource for Integrated Epigenomics Studies (ARIES). Results Restricting association testing to EWAS at baseline identified a less complete set of associations than performing EWAS at each time-point or applying the longitudinal modelling approaches to the full dataset. Linear regression models with cluster-robust standard errors identified similar sets of associations with almost identical estimates of effect as the multilevel models, while also being 74 times more efficient. Both longitudinal modelling approaches identified comparable sets of CpG sites in ARIES with an association with prenatal exposure to smoking (>70% agreement). Conclusions Linear regression with cluster-robust standard errors is an appropriate and efficient approach for longitudinal analysis of DNA methylation data.

[1]  D. Lawlor,et al.  Cohort Profile: The Avon Longitudinal Study of Parents and Children: ALSPAC mothers cohort , 2012, International journal of epidemiology.

[2]  Peter Sperisen,et al.  Longitudinal omics modeling and integration in clinical metabonomics research: challenges in childhood metabolic health research , 2015, Front. Mol. Biosci..

[3]  Andrew Gelman,et al.  Multilevel (Hierarchical) Modeling: What It Can and Cannot Do , 2006, Technometrics.

[4]  K. Hansen,et al.  Functional normalization of 450k methylation array data improves replication in large cancer studies , 2014, Genome Biology.

[5]  Tom R. Gaunt,et al.  Sex-associated autosomal DNA methylation differences are wide-spread and stable throughout childhood , 2017, bioRxiv.

[6]  E. Andres Houseman,et al.  Reference-free cell mixture adjustments in analysis of DNA methylation data , 2014, Bioinform..

[7]  J. Flanagan,et al.  Epigenome-wide association studies (EWAS): past, present, and future. , 2015, Methods in molecular biology.

[8]  Joshua D. Angrist,et al.  Mostly Harmless Econometrics: An Empiricist's Companion , 2008 .

[9]  Emmanuel Lesaffre,et al.  GWAS with longitudinal phenotypes: performance of approximate procedures , 2015, European Journal of Human Genetics.

[10]  J. Kmenta Mostly Harmless Econometrics: An Empiricist's Companion , 2010 .

[11]  Josine L. Min,et al.  Meffil: efficient normalisation and analysis of very large DNA methylation samples , 2017, bioRxiv.

[12]  Tom R. Gaunt,et al.  The Avon Longitudinal Study of Parents and Children (ALSPAC): an update on the enrolled sample of index children in 2019 , 2019, Wellcome open research.

[13]  Dirk Eddelbuettel,et al.  Rcpp: Seamless R and C++ Integration , 2011 .

[14]  Allyson L. Lister,et al.  Data Resource Profile: Accessible Resource for Integrated Epigenomic Studies (ARIES). , 2015, International journal of epidemiology.

[15]  Alex E. Lash,et al.  Gene Expression Omnibus: NCBI gene expression and hybridization array data repository , 2002, Nucleic Acids Res..

[16]  D. Lawlor,et al.  Cohort Profile: The ‘Children of the 90s’—the index offspring of the Avon Longitudinal Study of Parents and Children , 2012, International journal of epidemiology.

[17]  D. Balding,et al.  Epigenome-wide association studies for common human diseases , 2011, Nature Reviews Genetics.

[18]  Joseph Beyene,et al.  Longitudinal Data Analysis in Genome‐Wide Association Studies , 2014, Genetic epidemiology.

[19]  Nilesh J Samani,et al.  Cigarette smoking reduces DNA methylation levels at multiple genomic loci but the effect is partially reversible upon cessation , 2014, Epigenetics.

[20]  Wei Chen,et al.  Longitudinal Genome-Wide Association of Cardiovascular Disease Risk Factors in the Bogalusa Heart Study , 2010, PLoS genetics.

[21]  J. Milyo,et al.  Estimating the Impact of State Policies and Institutions with Mixed-Level Data , 2007, State Politics & Policy Quarterly.

[22]  Tom R. Gaunt,et al.  Longitudinal analysis of DNA methylation associated with birth weight and gestational age , 2015, Human molecular genetics.

[23]  Charles Auffray,et al.  DNA Methylation in Newborns and Maternal Smoking in Pregnancy: Genome-wide Consortium Meta-analysis. , 2016, American journal of human genetics.

[24]  Carrie V. Breton,et al.  Maternal smoking in pregnancy and DNA methylation in newborns: Genome-wide consortium meta-analysis , 2016 .

[25]  Albert Hofman,et al.  Fast linear mixed model computations for genome‐wide association studies with longitudinal data , 2013, Statistics in medicine.

[26]  C. Sotiriou,et al.  Evaluation of the Infinium Methylation 450K technology. , 2011, Epigenomics.

[27]  Stephan Beck,et al.  Advances in epigenome-wide association studies for common diseases , 2014, Trends in molecular medicine.

[28]  Paolo Vineis,et al.  Epigenetic Signatures of Cigarette Smoking , 2016, Circulation. Cardiovascular genetics.

[29]  Risto Lehtonen,et al.  Multilevel Statistical Models , 2005 .

[30]  John D. Storey,et al.  Capturing Heterogeneity in Gene Expression Studies by Surrogate Variable Analysis , 2007, PLoS genetics.

[31]  R. Weksberg,et al.  Discovery of cross-reactive probes and polymorphic CpGs in the Illumina Infinium HumanMethylation450 microarray , 2013, Epigenetics.

[32]  S. Horvath DNA methylation age of human tissues and cell types , 2013, Genome Biology.