Genome-wide methylation data mirror ancestry information

BackgroundGenetic data are known to harbor information about human demographics, and genotyping data are commonly used for capturing ancestry information by leveraging genome-wide differences between populations. In contrast, it is not clear to what extent population structure is captured by whole-genome DNA methylation data.ResultsWe demonstrate, using three large-cohort 450K methylation array data sets, that ancestry information signal is mirrored in genome-wide DNA methylation data and that it can be further isolated more effectively by leveraging the correlation structure of CpGs with cis-located SNPs. Based on these insights, we propose a method, EPISTRUCTURE, for the inference of ancestry from methylation data, without the need for genotype data.ConclusionsEPISTRUCTURE can be used to infer ancestry information of individuals based on their methylation data in the absence of corresponding genetic data. Although genetic data are often collected in epigenetic studies of large cohorts, these are typically not made publicly available, making the application of EPISTRUCTURE especially useful for anyone working on public data. Implementation of EPISTRUCTURE is available in GLINT, our recently released toolset for DNA methylation analysis at: http://glint-epigenetics.readthedocs.io.

[1]  Donald Macleod,et al.  A fraction of the mouse genome that is derived from islands of nonmethylated, CpG-rich DNA , 1985, Cell.

[2]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[3]  P. Donnelly,et al.  Inference of population structure using multilocus genotype data. , 2000, Genetics.

[4]  N. Holland,et al.  CHAMACOS, A Longitudinal Birth Cohort Study: Lessons from the Fields , 2003 .

[5]  C. Gieger,et al.  KORA-gen - Resource for Population Genetics, Controls and a Broad Spectrum of Disease Phenotypes , 2005, Gesundheitswesen (Bundesverband der Arzte des Offentlichen Gesundheitsdienstes (Germany)).

[6]  Hui-Ju Tsai,et al.  Comparison of three methods to estimate genetic ancestry and control for stratification in genetic association studies among admixed populations , 2005, Human Genetics.

[7]  D. Brutlag,et al.  A genome-wide analysis of CpG dinucleotides in the human genome distinguishes two distinct classes of promoters , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[8]  D. Reich,et al.  Principal components analysis corrects for stratification in genome-wide association studies , 2006, Nature Genetics.

[9]  K. Gunderson,et al.  Illumina universal bead arrays. , 2006, Methods in enzymology.

[10]  Cheng Li,et al.  Adjusting batch effects in microarray expression data using empirical Bayes methods. , 2007, Biostatistics.

[11]  B. Tycko,et al.  Genomic surveys by methylation-sensitive SNP analysis identify sequence-dependent allele-specific DNA methylation , 2008, Nature Genetics.

[12]  E. Ziv,et al.  Genetic ancestry and risk of breast cancer among U.S. Latinas. , 2008, Cancer research.

[13]  David Reich,et al.  Discerning the Ancestry of European Americans in Genetic Association Studies , 2007, PLoS genetics.

[14]  Amit R. Indap,et al.  Genes mirror geography within Europe , 2008, Nature.

[15]  R. Mei,et al.  Genome-wide screen for asthma in Puerto Ricans: evidence for association with 5q23 region , 2008, Human Genetics.

[16]  David H. Alexander,et al.  Fast model-based estimation of ancestry in unrelated individuals. , 2009, Genome research.

[17]  Madeleine P. Ball,et al.  Corrigendum: Targeted and genome-scale strategies reveal gene-body methylation signatures in human cells , 2009, Nature Biotechnology.

[18]  René S. Kahn,et al.  The Relationship of DNA Methylation with Age, Gender and Genotype in Twins and Healthy Controls , 2009, PloS one.

[19]  Christian Gieger,et al.  Meta-Analysis of 28,141 Individuals Identifies Common Variants within Five New Loci That Influence Uric Acid Concentrations , 2009, PLoS genetics.

[20]  A. Feinberg,et al.  Genome-wide methylation analysis of human colon cancer reveals similar hypo- and hypermethylation at conserved tissue-specific CpG island shores , 2008, Nature Genetics.

[21]  R. Plomin,et al.  Allelic skewing of DNA methylation is widespread across the genome. , 2010, American journal of human genetics.

[22]  Chia-Lin Wei,et al.  Dynamic changes in the human methylome during differentiation. , 2010, Genome research.

[23]  Lijun Cheng,et al.  Genetic control of individual differences in gene-specific methylation in human brain. , 2010, American journal of human genetics.

[24]  E. Ziv,et al.  European Ancestry Is Positively Associated with Breast Cancer Risk in Mexican Women , 2010, Cancer Epidemiology, Biomarkers & Prevention.

[25]  Luigi Ferrucci,et al.  Abundant Quantitative Trait Loci Exist for DNA Methylation and Gene Expression in Human Brain , 2010, PLoS genetics.

[26]  Hunter B. Fraser,et al.  Population-specificity of human DNA methylation , 2012, Genome Biology.

[27]  Gary K. Chen,et al.  Enhanced Statistical Tests for GWAS in Admixed Populations: Assessment using African Americans from CARe and a Breast Cancer Consortium , 2011, PLoS genetics.

[28]  Joseph K. Pickrell,et al.  DNA methylation patterns associate with genetic and gene expression variation in HapMap cell lines , 2011, Genome Biology.

[29]  Lipika R. Pal,et al.  Genetic basis of common human disease: insight into the role of nonsynonymous SNPs from genome-wide association studies , 2011, Genome Biology.

[30]  Christopher R. Gignoux,et al.  Cosmopolitan and ethnic-specific replication of genetic risk factors for asthma in 2 Latino populations. , 2011, The Journal of allergy and clinical immunology.

[31]  K. Gunderson,et al.  High density DNA methylation array with single CpG site resolution. , 2011, Genomics.

[32]  Pui-Yan Kwok,et al.  Design and coverage of high throughput genotyping arrays optimized for individuals of East Asian, African American, and Latino race/ethnicity using imputation and a novel hybrid SNP selection algorithm. , 2011, Genomics.

[33]  Eran Halperin,et al.  A model-based approach for analysis of spatial structure in genetic data , 2012, Nature Genetics.

[34]  Susan K. Murphy,et al.  450K Epigenome-Wide Scan Identifies Differential DNA Methylation in Newborns Related to Maternal Smoking during Pregnancy , 2012, Environmental health perspectives.

[35]  J. Kere,et al.  Differential DNA Methylation in Purified Human Blood Cells: Implications for Cell Lineage and Studies on Disease Susceptibility , 2012, PloS one.

[36]  Devin C. Koestler,et al.  DNA methylation arrays as surrogate measures of cell mixture distribution , 2012, BMC Bioinformatics.

[37]  Jun Ma,et al.  The Genboree Microbiome Toolset and the analysis of 16S rRNA microbial sequences , 2012, BMC Bioinformatics.

[38]  A. Oshlack,et al.  SWAN: Subset-quantile Within Array Normalization for Illumina Infinium HumanMethylation450 BeadChips , 2012, Genome Biology.

[39]  Wei Zhang,et al.  Genome-Wide Variation of Cytosine Modifications Between European and African Populations and the Implications for Complex Traits , 2013, Genetics.

[40]  Anna Decker,et al.  Considerations for normalization of DNA methylation data by Illumina 450K BeadChip assay in population studies , 2013, Epigenetics.

[41]  R. Weksberg,et al.  Discovery of cross-reactive probes and polymorphic CpGs in the Illumina Infinium HumanMethylation450 microarray , 2013, Epigenetics.

[42]  Martin J. Aryee,et al.  Epigenome-wide association data implicate DNA methylation as an intermediary of genetic risk in Rheumatoid Arthritis , 2013, Nature Biotechnology.

[43]  Gian Marco Prazzoli,et al.  Identification of pathways directly regulated by SHORT VEGETATIVE PHASE during vegetative and reproductive development in Arabidopsis , 2013, Genome Biology.

[44]  Margaret R Karagas,et al.  Blood-based profiles of DNA methylation predict the underlying distribution of cell types , 2013, Epigenetics.

[45]  Lynn M Almli,et al.  Methylation quantitative trait loci (meQTLs) are consistently detected across ancestry, developmental stage, and tissue type , 2014, BMC Genomics.

[46]  R. Irizarry,et al.  Accounting for cellular heterogeneity is critical in epigenome-wide association studies , 2014, Genome Biology.

[47]  K. Beckman,et al.  Associations of PON1 and Genetic Ancestry with Obesity in Early Childhood , 2013, PloS one.

[48]  Francesco Marabita,et al.  A beta-mixture quantile normalization method for correcting probe design bias in Illumina Infinium 450 k DNA methylation data , 2012, Bioinform..

[49]  S. Horvath DNA methylation age of human tissues and cell types , 2013, Genome Biology.

[50]  Ruth Pidsley,et al.  A data-driven approach to preprocessing Illumina 450K methylation array data , 2013, BMC Genomics.

[51]  Degui Zhi,et al.  SNPs located at CpG sites modulate genome-epigenome interaction , 2013, Epigenetics.

[52]  Christian Gieger,et al.  Tobacco Smoking Leads to Extensive Genome-Wide Changes in DNA Methylation , 2013, PloS one.

[53]  John D. Blischak,et al.  Methylation QTLs Are Associated with Coordinated Changes in Transcription Factor Binding, Histone Modifications, and Gene Expression Levels , 2014, bioRxiv.

[54]  W. Reik,et al.  Selective impairment of methylation maintenance is the major cause of DNA methylation reprogramming in the early embryo , 2015, Epigenetics & Chromatin.

[55]  A. Baccarelli,et al.  Epigenome‐wide DNA methylation changes with development of arsenic‐induced skin lesions in Bangladesh: A case–control follow‐up study , 2014, Environmental and molecular mutagenesis.

[56]  C. Gieger,et al.  DNA methylation and body-mass index: a genome-wide analysis , 2014, The Lancet.

[57]  Carlos Bustamante,et al.  Genome-wide association study and admixture mapping identify different asthma-associated loci in Latinos: the Genes-environments & Admixture in Latino Americans study. , 2014, The Journal of allergy and clinical immunology.

[58]  N. Holland,et al.  Organophosphate pesticide exposure, PON1, and neurodevelopment in school-age children from the CHAMACOS study. , 2014, Environmental research.

[59]  Richard T. Barfield,et al.  Accounting for Population Stratification in DNA Methylation Studies , 2014, Genetic epidemiology.

[60]  H. Drummond,et al.  Two-stage Genome-wide Methylation Profiling in Childhood-onset Crohn's Disease Implicates Epigenetic Alterations at the VMP1/MIR21 and HLA Loci , 2014, Inflammatory bowel diseases.

[61]  Ajay K. Royyuru,et al.  Geographic population structure analysis of worldwide human populations infers their biogeographical origins , 2014, Nature Communications.

[62]  Rafael A. Irizarry,et al.  Minfi: a flexible and comprehensive Bioconductor package for the analysis of Infinium DNA methylation microarrays , 2014, Bioinform..

[63]  Alan Hubbard,et al.  Estimation of blood cellular heterogeneity in newborns and children for epigenome‐wide association studies , 2015, Environmental and molecular mutagenesis.

[64]  David V Conti,et al.  Genetic ancestry influences asthma susceptibility and lung function among Latinos. , 2015, The Journal of allergy and clinical immunology.

[65]  Gabor T. Marth,et al.  A global reference for human genetic variation , 2015, Nature.

[66]  D. Hernandez,et al.  DNA Methylation of Lipid-Related Genes Affects Blood Lipid Levels , 2015, Circulation. Cardiovascular genetics.

[67]  C. Gieger,et al.  Characterization of whole-genome autosomal differences of DNA methylation between men and women , 2015, Epigenetics & Chromatin.

[68]  Brenda Eskenazi,et al.  Sex differences in DNA methylation assessed by 450 K BeadChip in newborns , 2015, BMC Genomics.

[69]  Shan V Andrews,et al.  DNA methylation of cord blood cell types: Applications for mixed cell birth studies , 2016, Epigenetics.

[70]  Eran Halperin,et al.  Sparse PCA corrects for cell type heterogeneity in epigenome-wide association studies , 2016, Nature Methods.

[71]  Christopher R. Gignoux,et al.  Methylation Analysis Reveals Fundamental Differences Between Ethnicity and Genetic Ancestry , 2016 .

[72]  Differential methylation between ethnic sub-groups reflects the effect of genetic ancestry and environmental exposures , 2016 .