Cell-type deconvolution in epigenome-wide association studies: a review and recommendations.

A major challenge faced by epigenome-wide association studies (EWAS) is cell-type heterogeneity. As many EWAS have already demonstrated, adjusting for changes in cell-type composition can be critical when analyzing and interpreting findings from such studies. Because of their importance, a great number of different statistical algorithms, which adjust for cell-type composition, have been proposed. Some of the methods are 'reference based' in that they require a priori defined reference DNA methylation profiles of cell types that are present in the tissue of interest, while other algorithms are 'reference free.' At present, however, it is unclear how best to adjust for cell-type heterogeneity, as this may also largely depend on the type of tissue and phenotype being considered. Here, we provide a critical review of the major existing algorithms for correcting cell-type composition in the context of Illumina Infinium Methylation Beadarrays, with the aim of providing useful recommendations to the EWAS community.

[1]  Michael J. Ziller,et al.  Information recovery from low coverage whole-genome bisulfite sequencing , 2016, Nature Communications.

[2]  Ash A. Alizadeh,et al.  Abstract PR09: The prognostic landscape of genes and infiltrating immune cells across human cancers , 2015 .

[3]  Andrew E. Teschendorff,et al.  Age-associated epigenetic drift: implications, and a case of epigenetic thrift? , 2013, Human molecular genetics.

[4]  Stephan Beck,et al.  The methylome: approaches for global DNA methylation profiling. , 2008, Trends in genetics : TIG.

[5]  A. Uitterlinden,et al.  Genetic and environmental influences interact with age and sex in shaping the human methylome , 2016, Nature Communications.

[6]  Jen Jen Yeh,et al.  Virtual microdissection identifies distinct tumor- and stroma-specific subtypes of pancreatic ductal adenocarcinoma , 2015, Nature Genetics.

[7]  Han Xu,et al.  MethylPurify: tumor purity deconvolution and differential methylation detection from single tumor DNA methylomes , 2014, Genome Biology.

[8]  Jeffrey T Leek,et al.  Bump hunting to identify differentially methylated regions in epigenetic epidemiology studies. , 2012, International journal of epidemiology.

[9]  Ina Hoeschele,et al.  Age-related variations in the methylome associated with gene expression in human monocytes and T cells , 2014, Nature Communications.

[10]  C. Marsit,et al.  Reference-free deconvolution of DNA methylation data and mediation by cell composition effects , 2016, bioRxiv.

[11]  Devin C. Koestler,et al.  Improving cell mixture deconvolution by identifying optimal DNA methylation libraries (IDOL) , 2016, BMC Bioinformatics.

[12]  Thomas Lengauer,et al.  DNA Methylation Dynamics of Human Hematopoietic Stem Cell Differentiation , 2016, Cell stem cell.

[13]  David M. Simcha,et al.  Tackling the widespread and critical impact of batch effects in high-throughput data , 2010, Nature Reviews Genetics.

[14]  Jeffrey T Leek,et al.  A general framework for multiple testing dependence , 2008, Proceedings of the National Academy of Sciences.

[15]  John K. Wiencke,et al.  Cell-composition effects in the analysis of DNA methylation array data: a mathematical perspective , 2015, BMC Bioinformatics.

[16]  Eran Halperin,et al.  Sparse PCA corrects for cell type heterogeneity in epigenome-wide association studies , 2016, Nature Methods.

[17]  Michael J. Ziller,et al.  Saturation analysis for whole-genome bisulfite sequencing data , 2016, Nature Biotechnology.

[18]  S. Shen-Orr,et al.  Computational deconvolution: extracting cell type-specific information from heterogeneous samples. , 2013, Current opinion in immunology.

[19]  Emanuel J. V. Gonçalves,et al.  A Landscape of Pharmacogenomic Interactions in Cancer , 2016, Cell.

[20]  M. Fraga,et al.  Epigenetics and the environment: emerging patterns and implications , 2012, Nature Reviews Genetics.

[21]  Marie-France Hivert,et al.  Validation of a DNA methylation reference panel for the estimation of nucleated cells types in cord blood , 2016, Epigenetics.

[22]  G. Getz,et al.  Inferring tumour purity and stromal and immune cell admixture from expression data , 2013, Nature Communications.

[23]  Shan V Andrews,et al.  DNA methylation of cord blood cell types: Applications for mixed cell birth studies , 2016, Epigenetics.

[24]  A. Gnirke,et al.  Charting a dynamic DNA methylation landscape of the human genome , 2013, Nature.

[25]  W. Reik Stability and flexibility of epigenetic gene regulation in mammalian development , 2007, Nature.

[26]  Martin J. Aryee,et al.  Epigenome-wide association data implicate DNA methylation as an intermediary of genetic risk in Rheumatoid Arthritis , 2013, Nature Biotechnology.

[27]  A. Feinberg,et al.  The epigenetic progenitor origin of human cancer , 2006, Nature Reviews Genetics.

[28]  Andrew E. Teschendorff,et al.  Independent surrogate variable analysis to deconvolve confounding factors in large-scale microarray profiling studies , 2011, Bioinform..

[29]  Andrew E. Teschendorff,et al.  An Integrative Multi-scale Analysis of the Dynamic DNA Methylation Landscape in Aging , 2015, PLoS genetics.

[30]  Paul H. C. Eilers,et al.  Prenatal parental tobacco smoking, gene specific DNA methylation, and newborns size: the Generation R study , 2015, Clinical Epigenetics.

[31]  Andrew E. Teschendorff,et al.  A comparison of reference-based algorithms for correcting cell-type heterogeneity in Epigenome-Wide Association Studies , 2017, BMC Bioinformatics.

[32]  Morris A. Swertz,et al.  Age-related accrual of methylomic variability is linked to fundamental ageing mechanisms , 2016, Genome Biology.

[33]  M. Esteller,et al.  Validation of a DNA methylation microarray for 850,000 CpG sites of the human genome enriched in enhancer sequences , 2015, Epigenomics.

[34]  John D. Storey,et al.  Capturing Heterogeneity in Gene Expression Studies by Surrogate Variable Analysis , 2007, PLoS genetics.

[35]  Joshua M. Stuart,et al.  The Cancer Genome Atlas Pan-Cancer analysis project , 2013, Nature Genetics.

[36]  N. Ahuja,et al.  Accelerated age-related CpG island methylation in ulcerative colitis. , 2001, Cancer research.

[37]  H. Brenner,et al.  DNA methylation changes of whole blood cells in response to active smoking exposure in adults: a systematic review of DNA methylation studies , 2015, Clinical Epigenetics.

[38]  T. Spector,et al.  Epigenetic differences arise during the lifetime of monozygotic twins. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[39]  S. Horvath,et al.  HIV-1 Infection Accelerates Age According to the Epigenetic Clock , 2015, The Journal of infectious diseases.

[40]  Andrew E. Teschendorff,et al.  A systems-level integrative framework for genome-wide DNA methylation and gene expression data identifies differential gene expression modules under epigenetic control , 2014, Bioinform..

[41]  R. Irizarry,et al.  Accounting for cellular heterogeneity is critical in epigenome-wide association studies , 2014, Genome Biology.

[42]  U. Sack,et al.  A varying T cell subtype explains apparent tobacco smoking induced single CpG hypomethylation in whole blood , 2015, Clinical Epigenetics.

[43]  Martin J. Aryee,et al.  Epigenome-wide association studies without the need for cell-type composition , 2014, Nature Methods.

[44]  Thomas Lengauer,et al.  BLUEPRINT to decode the epigenetic signature written in blood , 2012, Nature Biotechnology.

[45]  Per Magnus,et al.  Cell type specific DNA methylation in cord blood: A 450K-reference data set and cell count-based validation of estimated cell type composition , 2016, Epigenetics.

[46]  Terence P. Speed,et al.  Removing unwanted variation in a differential methylation analysis of Illumina HumanMethylation450 array data , 2015, bioRxiv.

[47]  Hein Putter,et al.  Persistent epigenetic differences associated with prenatal exposure to famine in humans , 2008, Proceedings of the National Academy of Sciences.

[48]  Ben D. MacArthur,et al.  Statistical Mechanics of Pluripotency , 2013, Cell.

[49]  David Z. Chen,et al.  Architecture of the human regulatory network derived from ENCODE data , 2012, Nature.

[50]  Winston Timp,et al.  Large hypomethylated blocks as a universal defining epigenetic alteration in human solid tumors , 2014, Genome Medicine.

[51]  Jay Shendure,et al.  Estimating human mutation rate using autozygosity in a founder population , 2012, Nature Genetics.

[52]  A. Bird,et al.  CpG islands and the regulation of transcription. , 2011, Genes & development.

[53]  Owen T McCann,et al.  Human aging-associated DNA hypermethylation occurs preferentially at bivalent chromatin domains. , 2010, Genome research.

[54]  Matthias W. Beckmann,et al.  DNA methylation outliers in normal breast tissue identify field defects that are enriched in cancer , 2016, Nature Communications.

[55]  John K Wiencke,et al.  Quantitative reconstruction of leukocyte subsets using DNA methylation , 2013, Genome Biology.

[56]  Shijie C. Zheng,et al.  Correlation of an epigenetic mitotic clock with cancer risk , 2016, Genome Biology.

[57]  Ash A. Alizadeh,et al.  Robust enumeration of cell subsets from tissue expression profiles , 2015, Nature Methods.

[58]  H. Stunnenberg,et al.  Transcriptional Landscape of Human Tissue Lymphocytes Unveils Uniqueness of Tumor-Infiltrating T Regulatory Cells , 2016, Immunity.

[59]  Andrew E. Jaffe,et al.  Bioinformatics Applications Note Gene Expression the Sva Package for Removing Batch Effects and Other Unwanted Variation in High-throughput Experiments , 2022 .

[60]  M. Hirst,et al.  The International Human Epigenome Consortium: A Blueprint for Scientific Collaboration and Discovery , 2016, Cell.

[61]  Joachim Selbig,et al.  Biomarker discovery in heterogeneous tissue samples -taking the in-silico deconfounding approach , 2010, BMC Bioinformatics.

[62]  Andrew E. Teschendorff,et al.  Correlation of Smoking-Associated DNA Methylation Changes in Buccal Cells With DNA Methylation Changes in Epithelial Cancer. , 2015, JAMA oncology.

[63]  Margaret R Karagas,et al.  Leukocyte-adjusted epigenome-wide association studies of blood from solid tumor patients , 2014, Epigenetics.

[64]  Nathan C. Sheffield,et al.  The accessible chromatin landscape of the human genome , 2012, Nature.

[65]  Johann A. Gagnon-Bartsch,et al.  Using control genes to correct for unwanted variation in microarray data. , 2012, Biostatistics.

[66]  Renaud Gaujoux,et al.  CellMix: a comprehensive toolbox for gene expression deconvolution , 2013, Bioinform..

[67]  Wolfgang Wagner,et al.  Age-dependent DNA methylation of genes that are suppressed in stem cells is a hallmark of cancer. , 2010, Genome research.

[68]  Michael Q. Zhang,et al.  Integrative analysis of 111 reference human epigenomes , 2015, Nature.

[69]  B. Horsthemke Epimutations in human disease. , 2006, Current topics in microbiology and immunology.

[70]  Charles H. Yoon,et al.  Dissecting the multicellular ecosystem of metastatic melanoma by single-cell RNA-seq , 2016, Science.

[71]  R. Jirtle,et al.  Environmental epigenomics and disease susceptibility , 2007, Nature Reviews Genetics.

[72]  M. Esteller,et al.  Validation of a DNA methylation microarray for 450,000 CpG sites in the human genome , 2011, Epigenetics.

[73]  C. Greenwood,et al.  An evaluation of methods correcting for cell-type heterogeneity in DNA methylation studies , 2015, Genome Biology.

[74]  Stephan Beck,et al.  Taking the measure of the methylome , 2010, Nature Biotechnology.

[75]  M. Eszlinger,et al.  Tobacco smoking differently influences cell types of the innate and adaptive immune system—indications from CpG site methylation , 2016, Clinical Epigenetics.

[76]  Arturas Petronis,et al.  Epigenetics as a unifying principle in the aetiology of complex traits and diseases , 2010, Nature.

[77]  D. Balding,et al.  Epigenome-wide association studies for common human diseases , 2011, Nature Reviews Genetics.

[78]  J. Kere,et al.  Differential DNA Methylation in Purified Human Blood Cells: Implications for Cell Lineage and Studies on Disease Susceptibility , 2012, PloS one.

[79]  E. Andres Houseman,et al.  Reference-free cell mixture adjustments in analysis of DNA methylation data , 2014, Bioinform..

[80]  Devin C. Koestler,et al.  DNA methylation arrays as surrogate measures of cell mixture distribution , 2012, BMC Bioinformatics.

[81]  S. Horvath DNA methylation age of human tissues and cell types , 2013, Genome Biology.

[82]  Martin J. Aryee,et al.  A cell epigenotype specific model for the correction of brain cellular heterogeneity bias and its application to age, brain region and major depression , 2013, Epigenetics.

[83]  A. Hofman,et al.  Identification of context-dependent expression quantitative trait loci in whole blood , 2016, Nature Genetics.