TCGA Workflow: Analyze cancer genomics and epigenomics data using Bioconductor packages

Biotechnological advances in sequencing have led to an explosion of publicly available data via large international consortia such as The Cancer Genome Atlas (TCGA), The Encyclopedia of DNA Elements (ENCODE), and The NIH Roadmap Epigenomics Mapping Consortium (Roadmap). These projects have provided unprecedented opportunities to interrogate the epigenome of cultured cancer cell lines as well as normal and tumor tissues with high genomic resolution. The Bioconductor project offers more than 1,000 open-source software and statistical packages to analyze high-throughput genomic data. However, most packages are designed for specific data types (e.g. expression, epigenetics, genomics) and there is no one comprehensive tool that provides a complete integrative analysis of the resources and data provided by all three public projects. A need to create an integration of these different analyses was recently proposed. In this workflow, we provide a series of biologically focused integrative analyses of different molecular data. We describe how to download, process and prepare TCGA data and by harnessing several key Bioconductor packages, we describe how to extract biologically meaningful genomic and epigenomic data. Using Roadmap and ENCODE data, we provide a work plan to identify biologically relevant functional epigenomic elements associated with cancer. To illustrate our workflow, we analyzed two types of brain tumors: low-grade glioma (LGG) versus high-grade glioma (glioblastoma multiform or GBM). This workflow introduces the following Bioconductor packages: AnnotationHub, ChIPSeeker, ComplexHeatmap, pathview, ELMER, GAIA, MINET, RTCGAToolbox, TCGAbiolinks.

[1]  Nathaniel D. Heintzman,et al.  Histone modifications at human enhancers reflect global cell-type-specific gene expression , 2009, Nature.

[2]  Sandrine Dudoit,et al.  Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments , 2010, BMC Bioinformatics.

[3]  Jian Huang,et al.  Integrative Analysis of High‐throughput Cancer Studies With Contrasted Penalization , 2014, Genetic epidemiology.

[4]  M. Spielmann,et al.  A large genomic deletion leads to enhancer adoption by the lamin B1 gene: a second path to autosomal dominant adult-onset demyelinating leukodystrophy (ADLD) , 2015, Human molecular genetics.

[5]  Frank Emmert-Streib,et al.  Inferring the conservative causal core of gene regulatory networks , 2010, BMC Systems Biology.

[6]  Chris Wiggins,et al.  ARACNE: An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context , 2004, BMC Bioinformatics.

[7]  Gianluca Bontempi,et al.  TCGAbiolinks: an R/Bioconductor package for integrative analysis of TCGA data , 2015, Nucleic acids research.

[8]  Weijun Luo,et al.  Pathview: an R/Bioconductor package for pathway-based data integration and visualization , 2013, Bioinform..

[9]  Steven J. M. Jones,et al.  Comprehensive genomic characterization of head and neck squamous cell carcinomas , 2015, Nature.

[10]  Steven J. M. Jones,et al.  Comprehensive molecular characterization of human colon and rectal cancer , 2012, Nature.

[11]  G. Hon,et al.  Next-generation genomics: an integrative approach , 2010, Nature Reviews Genetics.

[12]  Raymond K. Auerbach,et al.  A User's Guide to the Encyclopedia of DNA Elements (ENCODE) , 2011, PLoS biology.

[13]  Steven J. M. Jones,et al.  Comprehensive Molecular Characterization of Papillary Renal-Cell Carcinoma. , 2016, The New England journal of medicine.

[14]  Steven J. M. Jones,et al.  Comprehensive molecular profiling of lung adenocarcinoma , 2014, Nature.

[15]  Raphael Gottardo,et al.  Orchestrating high-throughput genomic analysis with Bioconductor , 2015, Nature Methods.

[16]  R. Wilson,et al.  Identification of a CpG island methylator phenotype that defines a distinct subgroup of glioma. , 2010, Cancer cell.

[17]  Steven J. M. Jones,et al.  Comprehensive molecular characterization of gastric adenocarcinoma , 2014, Nature.

[18]  Steven J. M. Jones,et al.  Comprehensive molecular portraits of human breast tumors , 2012, Nature.

[19]  Steven J. M. Jones,et al.  Integrated Genomic Characterization of Papillary Thyroid Carcinoma , 2014, Cell.

[20]  J. Martens,et al.  Partitioning and plasticity of repressive histone methylation states in mammalian chromatin. , 2003, Molecular cell.

[21]  Mike Tyers,et al.  BioGRID: a general repository for interaction datasets , 2005, Nucleic Acids Res..

[22]  Ian M. Fingerman,et al.  NCBI Epigenomics: a new public resource for exploring epigenomic data sets , 2010, Nucleic Acids Res..

[23]  Kevin Kontos,et al.  Information-Theoretic Inference of Large Transcriptional Regulatory Networks , 2007, EURASIP J. Bioinform. Syst. Biol..

[24]  Steven J. M. Jones,et al.  Comprehensive genomic characterization of squamous cell lung cancers , 2012, Nature.

[25]  Britta A. M. Bouwman,et al.  A Single Oncogenic Enhancer Rearrangement Causes Concomitant EVI1 and GATA2 Deregulation in Leukemia , 2014, Cell.

[26]  Steven J. M. Jones,et al.  Comprehensive molecular characterization of clear cell renal cell carcinoma , 2013, Nature.

[27]  M. Samur RTCGAToolbox: A New Tool for Exporting TCGA Firehose Data , 2014, PloS one.

[28]  G. Getz,et al.  GISTIC2.0 facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers , 2011, Genome Biology.

[29]  Steven J. M. Jones,et al.  Genomic Classification of Cutaneous Melanoma , 2015, Cell.

[30]  Eric S. Lander,et al.  Genomic Maps and Comparative Analysis of Histone Modifications in Human and Mouse , 2005, Cell.

[31]  R. Young,et al.  Histone H3K27ac separates active from poised enhancers and predicts developmental state , 2010, Proceedings of the National Academy of Sciences.

[32]  Nathaniel D. Heintzman,et al.  Distinct and predictive chromatin signatures of transcriptional promoters and enhancers in the human genome , 2007, Nature Genetics.

[33]  Panayiotis V. Benos,et al.  STAMP: a web tool for exploring DNA-binding motif similarities , 2007, Nucleic Acids Res..

[34]  Danny Reinberg,et al.  Molecular Signals of Epigenetic States , 2010, Science.

[35]  Panayiotis V. Benos,et al.  DNA Familial Binding Profiles Made Easy: Comparison of Various Motif Alignment and Clustering Strategies , 2007, PLoS Comput. Biol..

[36]  E. Lander,et al.  Assessing the significance of chromosomal aberrations in cancer: Methodology and application to glioma , 2007, Proceedings of the National Academy of Sciences.

[37]  Benjamin J Raphael,et al.  Comprehensive Pan-Genomic Characterization of Adrenocortical Carcinoma. , 2016, Cancer cell.

[38]  Benjamin J Raphael,et al.  Comprehensive Pan-Genomic Characterization of Adrenocortical Carcinoma. , 2016, Cancer cell.

[39]  J. Tegnér,et al.  An evaluation of analysis pipelines for DNA methylation profiling using the Illumina HumanMethylation450 BeadChip platform , 2013, Epigenetics.

[40]  Derek Y. Chiang,et al.  The landscape of somatic copy-number alteration across human cancers , 2010, Nature.

[41]  L. Aaltonen,et al.  Mice Lacking a Myc Enhancer That Includes Human SNP rs6983267 Are Resistant to Intestinal Tumors , 2012, Science.

[42]  Gary D. Bader,et al.  GeneMANIA Cytoscape plugin: fast gene function predictions on the desktop , 2010, Bioinform..

[43]  Lijing Yao,et al.  Inferring regulatory element landscapes and transcription factor networks from cancer methylomes , 2015, Genome Biology.

[44]  Aedín C. Culhane,et al.  Public data and open source tools for multi-assay genomic investigation of disease , 2015, Briefings Bioinform..

[45]  A. Bird,et al.  CpG islands and the regulation of transcription. , 2011, Genes & development.

[46]  Steven J. M. Jones,et al.  Molecular Profiling Reveals Biologically Discrete Subsets and Pathways of Progression in Diffuse Glioma , 2016, Cell.

[47]  B. Berman,et al.  Demystifying the secret mission of enhancers: linking distal regulatory elements to target genes , 2015, Critical reviews in biochemistry and molecular biology.

[48]  A. Chinnaiyan,et al.  Integrative analysis of the cancer transcriptome , 2005, Nature Genetics.

[49]  T. Mikkelsen,et al.  The NIH Roadmap Epigenomics Mapping Consortium , 2010, Nature Biotechnology.

[50]  Gianluca Bontempi,et al.  minet: A R/Bioconductor Package for Inferring Large Transcriptional Networks Using Mutual Information , 2008, BMC Bioinformatics.

[51]  K. Robertson DNA methylation and human disease , 2005, Nature Reviews Genetics.

[52]  Qing-Yu He,et al.  ChIPseeker: an R/Bioconductor package for ChIP peak annotation, comparison and visualization , 2015, Bioinform..

[53]  Steven J. M. Jones,et al.  The Molecular Taxonomy of Primary Prostate Cancer , 2015, Cell.

[54]  Ryan A. Flynn,et al.  A unique chromatin signature uncovers early developmental enhancers in humans , 2011, Nature.

[55]  Brian Craft,et al.  The Cancer Genomics Hub (CGHub): overcoming cancer through the power of torrential data , 2014, Database J. Biol. Databases Curation.

[56]  Gordon Robertson,et al.  An Integrated Pipeline for the Genome-Wide Analysis of Transcription Factor Binding Sites from ChIP-Seq , 2011, PloS one.

[57]  Leping Li,et al.  GADEM: A Genetic Algorithm Guided Formation of Spaced Dyads Coupled with an EM Algorithm for Motif Discovery , 2009, J. Comput. Biol..

[58]  Yoshihide Hayashizaki,et al.  Histone H3 acetylated at lysine 9 in promoter is associated with low nucleosome density in the vicinity of transcription start site in human cell , 2006, Chromosome Research.

[59]  Kimberly D. Siegmund,et al.  Statistical approaches for the analysis of DNA methylation microarray data , 2011, Human Genetics.

[60]  Sandrine Dudoit,et al.  GC-Content Normalization for RNA-Seq Data , 2011, BMC Bioinformatics.

[61]  J. Collins,et al.  Large-Scale Mapping and Validation of Escherichia coli Transcriptional Regulation from a Compendium of Expression Profiles , 2007, PLoS biology.

[62]  Joshua M. Stuart,et al.  The Cancer Genome Atlas Pan-Cancer analysis project , 2013, Nature Genetics.

[63]  Michael Q. Zhang,et al.  Integrative analysis of 111 reference human epigenomes , 2015, Nature.

[64]  Lawrence A. Donehower,et al.  The somatic genomic landscape of chromophobe renal cell carcinoma. , 2014, Cancer cell.