A comprehensive map of genome-wide gene regulation in Mycobacterium tuberculosis

Mycobacterium tuberculosis (MTB) is a pathogenic bacterium responsible for 12 million active cases of tuberculosis (TB) worldwide. The complexity and critical regulatory components of MTB pathogenicity are still poorly understood despite extensive research efforts. In this study, we constructed the first systems-scale map of transcription factor (TF) binding sites and their regulatory target proteins in MTB. We constructed FLAG-tagged overexpression constructs for 206 TFs in MTB, used ChIP-seq to identify genome-wide binding events and surveyed global transcriptomic changes for each overexpressed TF. Here we present data for the most comprehensive map of MTB gene regulation to date. We also define elaborate quality control measures, extensive filtering steps, and the gene-level overlap between ChIP-seq and microarray datasets. Further, we describe the use of TF overexpression datasets to validate a global gene regulatory network model of MTB and describe an online source to explore the datasets.

[1]  Kerstin Kaufmann,et al.  ChIP-seq Analysis in R (CSAR): An R package for the statistical detection of protein-bound genomic regions , 2011, Plant Methods.

[2]  Tige R. Rustad,et al.  The Enduring Hypoxic Response of Mycobacterium tuberculosis , 2008, PloS one.

[3]  Diogo M. Camacho,et al.  Wisdom of crowds for robust gene network inference , 2012, Nature Methods.

[4]  Adamandia Kapopoulou,et al.  TubercuList--10 years after. , 2011, Tuberculosis.

[5]  Kyle J. Minch,et al.  Mapping and manipulating the Mycobacterium tuberculosis transcriptome using a transcription factor overexpression-derived regulatory network , 2014, Genome Biology.

[6]  J. Collins,et al.  Large-Scale Mapping and Validation of Escherichia coli Transcriptional Regulation from a Compendium of Expression Profiles , 2007, PLoS biology.

[7]  Martial Sankar,et al.  POLYPHEMUS: R package for comparative analysis of RNA polymerase II ChIP-seq profiles by non-linear normalization , 2012, Nucleic acids research.

[8]  Davide Heller,et al.  STRING v10: protein–protein interaction networks, integrated over the tree of life , 2014, Nucleic Acids Res..

[9]  P. Park,et al.  Design and analysis of ChIP-seq experiments for DNA-binding proteins , 2008, Nature Biotechnology.

[10]  Feng Lin,et al.  An HMM approach to genome-wide identification of differential histone modification sites from ChIP-seq data , 2008, Bioinform..

[11]  Nathan D. Price,et al.  The DNA-binding network of Mycobacterium tuberculosis , 2015, Nature Communications.

[12]  Philip L Felgner,et al.  Dynamic antibody responses to the Mycobacterium tuberculosis proteome , 2010, Proceedings of the National Academy of Sciences.

[13]  The Uniprot Consortium,et al.  UniProt: a hub for protein information , 2014, Nucleic Acids Res..

[14]  Nitin S. Baliga,et al.  A high-resolution network model for global gene regulation in Mycobacterium tuberculosis , 2014, Nucleic acids research.

[15]  Robert Grossman,et al.  PeakRanger: A cloud-enabled peak caller for ChIP-seq data , 2011, BMC Bioinformatics.

[16]  J. Zeitlinger,et al.  A computational pipeline for comparative ChIP-seq analyses , 2011, Nature Protocols.

[17]  Gordon K. Smyth,et al.  limma: Linear Models for Microarray Data , 2005 .

[18]  Olga T. Schubert,et al.  Genome-wide Mapping of Transcriptional Start Sites Defines an Extensive Leaderless Transcriptome in Mycobacterium tuberculosis , 2014, Cell Reports.

[19]  David L. Wheeler,et al.  GenBank , 2015, Nucleic Acids Res..

[20]  C. Ball,et al.  TB database 2010: overview and update. , 2010, Tuberculosis.

[21]  Steven J. M. Jones,et al.  Circos: an information aesthetic for comparative genomics. , 2009, Genome research.

[22]  J. Betts,et al.  Evaluation of a nutrient starvation model of Mycobacterium tuberculosis persistence by gene and protein expression profiling , 2002, Molecular microbiology.

[23]  E. Rubin,et al.  Characterization and Transcriptome Analysis of Mycobacterium tuberculosis Persisters , 2011, mBio.

[24]  Susumu Goto,et al.  KEGG for integration and interpretation of large-scale molecular data sets , 2011, Nucleic Acids Res..

[25]  Sean R. Davis,et al.  NCBI GEO: archive for functional genomics data sets—update , 2012, Nucleic Acids Res..

[26]  Maulik Shukla,et al.  Comparative genomic analysis at the PATRIC, a bioinformatic resource center. , 2014, Methods in molecular biology.

[27]  Connie R. Jimenez,et al.  Proteomic Profiling of Mycobacterium tuberculosis Identifies Nutrient-starvation-responsive Toxin–antitoxin Systems , 2013, Molecular & Cellular Proteomics.

[28]  Kyle J. Minch,et al.  Mycobacterium tuberculosis Growth following Aerobic Expression of the DosR Regulon , 2012, PloS one.

[29]  W. Schofield Overview and update. , 1987 .

[30]  Kyle J. Minch,et al.  Hypoxia: a window into Mycobacterium tuberculosis latency , 2009, Cellular microbiology.

[31]  E. Rubin,et al.  Genes required for mycobacterial growth defined by high density mutagenesis , 2003, Molecular microbiology.

[32]  Ruedi Aebersold,et al.  The Mtb proteome library: a resource of assays to quantify the complete proteome of Mycobacterium tuberculosis. , 2013, Cell host & microbe.

[33]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[34]  Thomas R. Ioerger,et al.  High-Resolution Phenotypic Profiling Defines Genes Essential for Mycobacterial Growth and Cholesterol Catabolism , 2011, PLoS pathogens.

[35]  S. Zolla-Pazner,et al.  Transcriptional Profiling of Mycobacterium tuberculosis Replicating Ex vivo in Blood from HIV- and HIV+ Subjects , 2014, PloS one.

[36]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[37]  Ning Jiang,et al.  Network portal: a database for storage, analysis and visualization of biological networks , 2013, Nucleic Acids Res..

[38]  Yves Van de Peer,et al.  The Mycobacterium tuberculosis regulatory network and hypoxia , 2013, Nature.

[39]  Darren A. Natale,et al.  The COG database: an updated version includes eukaryotes , 2003, BMC Bioinformatics.

[40]  Rainer Breitling,et al.  RankProd: a bioconductor package for detecting differentially expressed genes in meta-analysis , 2006, Bioinform..