Genome-wide DNA methylation and gene expression patterns reflect genetic ancestry and environmental differences across the Indonesian archipelago

Indonesia is the world’s fourth most populous country, host to striking levels of human diversity, regional patterns of admixture, and varying degrees of introgression from both Neanderthals and Denisovans. However, it has been largely excluded from the human genomics sequencing boom of the last decade. To serve as a benchmark dataset of molecular phenotypes across the region, we generated genome-wide CpG methylation and gene expression measurements in over 100 individuals from three locations that capture the major genomic and geographical axes of diversity across the Indonesian archipelago. Investigating between- and within-island differences, we find up to 10% of tested genes are differentially expressed between the islands of Mentawai (Sumatra) and New Guinea. Variation in gene expression is closely associated with DNA methylation, with expression levels of 9.7% of genes strongly correlating with nearby CpG methylation, and many of these genes being differentially expressed between islands. Genes identified in our differential expression and methylation analyses are enriched in pathways involved in immunity, highlighting Indonesia tropical role as a source of infectious disease diversity and the strong selective pressures these diseases have exerted on humans. Finally, we identify robust within-island variation in DNA methylation and gene expression, likely driven by very local environmental differences across sampling sites. Together, these results strongly suggest complex relationships between DNA methylation, transcription, archaic hominin introgression and immunity, all jointly shaped by the environment. This has implications for the application of genomic medicine, both in critically understudied Indonesia and globally, and will allow a better understanding of the interacting roles of genomic and environmental factors shaping molecular and complex phenotypes.

[1]  M. O'Dwyer,et al.  Sugar Free: Novel Immunotherapeutic Approaches Targeting Siglecs and Sialic Acids to Enhance Natural Killer Cell Cytotoxicity Against Cancer , 2019, Front. Immunol..

[2]  J. Pritchard,et al.  Variable prediction accuracy of polygenic scores within an ancestry group , 2019, bioRxiv.

[3]  J. Stephen Lansing,et al.  Multiple Deeply Divergent Denisovan Ancestries in Papuans , 2019, Cell.

[4]  Alicia R. Martin,et al.  Clinical use of current polygenic risk scores may exacerbate health disparities , 2019, Nature Genetics.

[5]  L. Quintana-Murci Human Immunology through the Lens of Evolutionary Genetics , 2019, Cell.

[6]  Morris A. Swertz,et al.  Deconvolution of bulk blood eQTL effects into immune cell subpopulations , 2019, BMC Bioinformatics.

[7]  James C. Hu,et al.  The Gene Ontology Resource: 20 years and still GOing strong , 2019 .

[8]  The Gene Ontology Consortium,et al.  The Gene Ontology Resource: 20 years and still GOing strong , 2018, Nucleic Acids Res..

[9]  W. Hawley,et al.  Zika Virus Seropositivity in 1–4-Year-Old Children, Indonesia, 2014 , 2018, Emerging infectious diseases.

[10]  Olivier Gevaert,et al.  MethylMix 2.0: an R package for identifying DNA methylation genes , 2018, Bioinform..

[11]  Fabien C. Lamaze,et al.  Gene-by-environment interactions in urban populations modulate risk phenotypes , 2018, Nature Communications.

[12]  B. Heit,et al.  Human‐Specific Mutations and Positively Selected Sites in MARCO Confer Functional Changes , 2018, Molecular biology and evolution.

[13]  D. Wainwright,et al.  IDO1 in cancer: a Gemini of immune checkpoints , 2018, Cellular & Molecular Immunology.

[14]  Stephen E. Fick,et al.  WorldClim 2: new 1‐km spatial resolution climate surfaces for global land areas , 2017 .

[15]  Sean S. Downey,et al.  Complex Patterns of Admixture across the Indonesian Archipelago , 2017, Molecular biology and evolution.

[16]  Suryanto,et al.  Healthcare System in Indonesia , 2017, Hospital Topics.

[17]  Christopher R. Gignoux,et al.  Human demographic history impacts genetic risk prediction across diverse populations , 2016, bioRxiv.

[18]  J. Akey,et al.  Archaic Hominin Admixture Facilitated Adaptation to Out-of-Africa Environments , 2016, Current Biology.

[19]  T. Hawn,et al.  MARCO variants are associated with phagocytosis, pulmonary tuberculosis susceptibility and Beijing lineage , 2016, Genes and Immunity.

[20]  S. Fullerton,et al.  Genomics is failing on diversity , 2016, Nature.

[21]  M. Cox,et al.  Reconstructing Demography and Social Behavior During the Neolithic Expansion from Genomic Diversity Across Island Southeast Asia , 2016, Genetics.

[22]  Michael C. Westaway,et al.  Genomic analyses inform on migration events during the peopling of Eurasia , 2016, Nature.

[23]  J. Coers,et al.  Interferon-Inducible GTPases in Host Resistance, Inflammation and Disease. , 2016, Journal of molecular biology.

[24]  Yun S. Song,et al.  The Simons Genome Diversity Project: 300 genomes from 142 diverse populations , 2016, Nature.

[25]  M. Cox,et al.  Small Traditional Human Communities Sustain Genomic Diversity over Microgeographic Scales despite Linguistic Isolation , 2016, Molecular biology and evolution.

[26]  R. Horton Offline: Indonesia—unravelling the mystery of a nation , 2016, The Lancet.

[27]  Jovana Maksimovic,et al.  missMethyl: an R package for analyzing data from Illumina's HumanMethylation450 platform , 2016, Bioinform..

[28]  A. Andrés,et al.  Introgression of Neandertal- and Denisovan-like Haplotypes Contributes to Adaptive Variation in Human Toll-like Receptors , 2016, American journal of human genetics.

[29]  G. Perry,et al.  The epigenomic landscape of African rainforest hunter-gatherers and farmers , 2015, Nature Communications.

[30]  Gabor T. Marth,et al.  A global reference for human genetic variation , 2015, Nature.

[31]  John E. Ladbury,et al.  Corrigendum: Grb2 monomer–dimer equilibrium determines normal versus oncogenic function , 2015, Nature Communications.

[32]  Ash A. Alizadeh,et al.  Robust enumeration of cell subsets from tissue expression profiles , 2015, Nature Methods.

[33]  Olivier Gevaert,et al.  MethylMix: an R package for identifying DNA methylation-driven genes , 2015, Bioinform..

[34]  Matthew E. Ritchie,et al.  limma powers differential expression analyses for RNA-sequencing and microarray studies , 2015, Nucleic acids research.

[35]  Carson C Chow,et al.  Second-generation PLINK: rising to the challenge of larger and richer datasets , 2014, GigaScience.

[36]  Peter L Molloy,et al.  De novo identification of differentially methylated regions in the human genome , 2015, Epigenetics & Chromatin.

[37]  Yutaka Suzuki,et al.  Interactive transcriptome analysis of malaria patients and infecting Plasmodium falciparum , 2014, Genome research.

[38]  Rafael A. Irizarry,et al.  Minfi: a flexible and comprehensive Bioconductor package for the analysis of Infinium DNA methylation microarrays , 2014, Bioinform..

[39]  P. Romero,et al.  Interactions between Siglec-7/9 receptors and ligands influence NK cell-dependent tumor immunosurveillance. , 2014, The Journal of clinical investigation.

[40]  Björn Usadel,et al.  Trimmomatic: a flexible trimmer for Illumina sequence data , 2014, Bioinform..

[41]  Wei Shi,et al.  featureCounts: an efficient general purpose program for assigning sequence reads to genomic features , 2013, Bioinform..

[42]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[43]  Charity W. Law,et al.  voom: precision weights unlock linear model analysis tools for RNA-seq read counts , 2014, Genome Biology.

[44]  Mauricio O. Carneiro,et al.  From FastQ Data to High‐Confidence Variant Calls: The Genome Analysis Toolkit Best Practices Pipeline , 2013, Current protocols in bioinformatics.

[45]  Aaron Golden,et al.  Gene-set analysis is severely biased when applied to genome-wide methylation data , 2013, Bioinform..

[46]  J. Weiser,et al.  MARCO Is Required for TLR2- and Nod2-Mediated Responses to Streptococcus pneumoniae and Clearance of Pneumococcal Colonization in the Murine Nasopharynx , 2013, The Journal of Immunology.

[47]  Thomas R. Gingeras,et al.  STAR: ultrafast universal RNA-seq aligner , 2013, Bioinform..

[48]  M. Newport,et al.  Genetic variants of MARCO are associated with susceptibility to pulmonary tuberculosis in a Gambian population , 2013, BMC Medical Genetics.

[49]  A. Oshlack,et al.  SWAN: Subset-quantile Within Array Normalization for Illumina Infinium HumanMethylation450 BeadChips , 2012, Genome Biology.

[50]  Devin C. Koestler,et al.  DNA methylation arrays as surrogate measures of cell mixture distribution , 2012, BMC Bioinformatics.

[51]  Guangchuang Yu,et al.  clusterProfiler: an R package for comparing biological themes among gene clusters. , 2012, Omics : a journal of integrative biology.

[52]  D. Reich,et al.  Denisova admixture and the first modern human dispersals into Southeast Asia and Oceania. , 2011, American journal of human genetics.

[53]  Loren Gragert,et al.  The Shaping of Modern Human Immune Systems by Multiregional Admixture with Archaic Humans , 2011, Science.

[54]  W. Liu,et al.  Genetic Variants in MARCO Are Associated with the Susceptibility to Pulmonary Tuberculosis in Chinese Han Population , 2011, PloS one.

[55]  M. DePristo,et al.  A framework for variation discovery and genotyping using next-generation DNA sequencing data , 2011, Nature Genetics.

[56]  S. Hay,et al.  Malaria distribution, prevalence, drug resistance and control in Indonesia. , 2011, Advances in parasitology.

[57]  A. Telenti,et al.  Critical role for CXCR6 in NK cell-mediated antigen-specific memory to haptens and viruses , 2010, Nature Immunology.

[58]  M. DePristo,et al.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. , 2010, Genome research.

[59]  M. Hammer,et al.  Autosomal and X-linked single nucleotide polymorphisms reveal a steep Asian–Melanesian ancestry cline in eastern Indonesia and a sex bias in admixture rates , 2010, Proceedings of the Royal Society B: Biological Sciences.

[60]  M. Robinson,et al.  A scaling normalization method for differential expression analysis of RNA-seq data , 2010, Genome Biology.

[61]  David H. Alexander,et al.  Fast model-based estimation of ancestry in unrelated individuals. , 2009, Genome research.

[62]  E. Birney,et al.  Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt , 2009, Nature Protocols.

[63]  Pan Du,et al.  lumi: a pipeline for processing Illumina microarray , 2008, Bioinform..

[64]  Han-Pil Choi,et al.  Emerging themes in IFN-γ-induced macrophage immunity by the p47 and p65 GTPase Families , 2008 .

[65]  Sean S. Downey,et al.  Coevolution of languages and genes on the island of Sumba, eastern Indonesia , 2007, Proceedings of the National Academy of Sciences.

[66]  Alan Y. Chiang,et al.  Generalized Additive Models: An Introduction With R , 2007, Technometrics.

[67]  Han-Pil Choi,et al.  Emerging themes in IFN-gamma-induced macrophage immunity by the p47 and p65 GTPase families. , 2007, Immunobiology.

[68]  S. Wood Generalized Additive Models: An Introduction with R , 2006 .

[69]  Abdallah S. Daar,et al.  Pharmacogenetics and geographical ancestry: implications for drug development and global health , 2005, Nature Reviews Genetics.

[70]  S. Wood Thin plate regression splines , 2003 .

[71]  Susumu Goto,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 2000, Nucleic Acids Res..

[72]  C. Dolea,et al.  World Health Organization , 1949, International Organization.