NetProphet 2.0: mapping transcription factor networks by exploiting scalable data resources

Abstract Motivation Cells process information, in part, through transcription factor (TF) networks, which control the rates at which individual genes produce their products. A TF network map is a graph that indicates which TFs bind and directly regulate each gene. Previous work has described network mapping algorithms that rely exclusively on gene expression data and ‘integrative’ algorithms that exploit a wide range of data sources including chromatin immunoprecipitation sequencing (ChIP-seq) of many TFs, genome-wide chromatin marks, and binding specificities for many TFs determined in vitro. However, such resources are available only for a few major model systems and cannot be easily replicated for new organisms or cell types. Results We present NetProphet 2.0, a ‘data light’ algorithm for TF network mapping, and show that it is more accurate at identifying direct targets of TFs than other, similarly data light algorithms. In particular, it improves on the accuracy of NetProphet 1.0, which used only gene expression data, by exploiting three principles. First, combining multiple approaches to network mapping from expression data can improve accuracy relative to the constituent approaches. Second, TFs with similar DNA binding domains bind similar sets of target genes. Third, even a noisy, preliminary network map can be used to infer DNA binding specificities from promoter sequences and these inferred specificities can be used to further improve the accuracy of the network map. Availability and implementation Source code and comprehensive documentation are freely available at https://github.com/yiming-kang/NetProphet_2.0. Supplementary information Supplementary data are available at Bioinformatics online.

[1]  Martin Vingron,et al.  Reconstruction of gene networks using prior knowledge , 2015, BMC Systems Biology.

[2]  Yoshihide Hayashizaki,et al.  A predictive computational framework for direct reprogramming between human cell types , 2016, Nature Genetics.

[3]  Mehdi M. Kashani,et al.  Large-Scale Genetic Perturbations Reveal Regulatory Networks and an Abundance of Gene-Specific Repressors , 2014, Cell.

[4]  Tamer Kahveci,et al.  Accessed Terms of Use , 2022 .

[5]  Kate B. Cook,et al.  Determination and Inference of Eukaryotic Transcription Factor Sequence Specificity , 2014, Cell.

[6]  T. Hughes,et al.  Introduction to "a handbook of transcription factors". , 2011, Sub-cellular biochemistry.

[7]  Samantha A. Morris,et al.  CellNet: Network Biology Applied to Stem Cell Engineering , 2014, Cell.

[8]  Diogo M. Camacho,et al.  Wisdom of crowds for robust gene network inference , 2012, Nature Methods.

[9]  Christopher D. Brown,et al.  Analysis of Drosophila Segmentation Network Identifies a JNK Pathway Factor Overexpressed in Kidney Cancer , 2009, Science.

[10]  Stuart K. Kim,et al.  Integrative analysis of C. elegans modENCODE ChIP-seq data sets to infer gene regulatory interactions , 2013, Genome research.

[11]  Lin Yang,et al.  Genome-wide features of neuroendocrine regulation in Drosophila by the basic helix-loop-helix transcription factor DIMMED , 2015, Nucleic acids research.

[12]  Ilya Shmulevich,et al.  Gene pair signatures in cell type transcriptomes reveal lineage control , 2013, Nature Methods.

[13]  Katy C. Kao,et al.  gNCA: a framework for determining transcription factor activity based on transcriptome: identifiability and numerical implementation. , 2005, Metabolic engineering.

[14]  Jonathan K. Pritchard,et al.  The Functional Consequences of Variation in Transcription Factor Binding , 2013, PLoS genetics.

[15]  J. Zeitlinger,et al.  Molecular Evolution of the Yap/Yorkie Proto-Oncogene and Elucidation of Its Core Transcriptional Program , 2014, Molecular biology and evolution.

[16]  Alexandre P. Francisco,et al.  YEASTRACT: providing a programmatic access to curated transcriptional regulatory associations in Saccharomyces cerevisiae through a web services interface , 2010, Nucleic Acids Res..

[17]  Ezekiel J. Maier,et al.  Model-based transcriptome engineering promotes a fermentative transcriptional state in yeast. , 2017, Proceedings of the National Academy of Sciences of the United States of America.

[18]  S. Russell,et al.  Spotted‐dick, a zinc‐finger protein of Drosophila required for expression of Orc4 and S phase , 2005, The EMBO journal.

[19]  Ezekiel J. Maier,et al.  Mapping functional transcription factor networks from gene expression data , 2013, Genome research.

[20]  William Stafford Noble,et al.  Epigenetic priors for identifying active transcription factor binding sites , 2012, Bioinform..

[21]  Stephanie L. Hyland,et al.  Identification of active transcriptional regulatory elements with GRO-seq , 2015, Nature Methods.

[22]  Chris Wiggins,et al.  ARACNE: An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context , 2004, BMC Bioinformatics.

[23]  Jean-Philippe Vert,et al.  TIGRESS: Trustful Inference of Gene REgulation using Stability Selection , 2012, BMC Systems Biology.

[24]  Ziv Bar-Joseph,et al.  Predicting tissue specific transcription factor binding sites , 2013, BMC Genomics.

[25]  Terry Speed,et al.  Genomic profiling and expression studies reveal both positive and negative activities for the Drosophila Myb MuvB/dREAM complex in proliferating cells. , 2007, Genes & development.

[26]  N. Slonim,et al.  A universal framework for regulatory element discovery across all genomes and data types. , 2007, Molecular cell.

[27]  Teresa M Przytycka,et al.  Sex- and tissue-specific functions of Drosophila doublesex transcription factor target genes. , 2014, Developmental cell.

[28]  Leighton J. Core,et al.  Nascent RNA Sequencing Reveals Widespread Pausing and Divergent Initiation at Human Promoters , 2008, Science.

[29]  T. Meehan,et al.  An atlas of active enhancers across human cell types and tissues , 2014, Nature.

[30]  C. Myers,et al.  A gene‐centered C. elegans protein–DNA interaction network provides a framework for functional predictions , 2016, Molecular systems biology.

[31]  T. Kivioja,et al.  Transcriptional Networks Controlling the Cell Cycle , 2013, G3: Genes | Genomes | Genetics.

[32]  Juan M. Vaquerizas,et al.  DNA-Binding Specificities of Human Transcription Factors , 2013, Cell.

[33]  Xiang-Jun Lu,et al.  Inferring Condition-Specific Modulation of Transcription Factor Activity in Yeast through Regulon-Based Analysis of Genomewide Expression , 2008, PloS one.

[34]  H. Chipman,et al.  BART: Bayesian Additive Regression Trees , 2008, 0806.3286.

[35]  Patrick J. Killion,et al.  Genetic reconstruction of a functional transcriptional regulatory network , 2007, Nature Genetics.

[36]  P. Geurts,et al.  Inferring Regulatory Networks from Expression Data Using Tree-Based Methods , 2010, PloS one.

[37]  M. Brent Past Roadblocks and New Opportunities in Transcription Factor Network Mapping. , 2016, Trends in genetics : TIG.

[38]  Richard Bonneau,et al.  DREAM4: Combining Genetic and Dynamic Information to Identify Biological Networks and Dynamical Models , 2010, PloS one.

[39]  Sally Temple,et al.  A Systematic Approach to Identify Candidate Transcription Factors that Control Cell Identity , 2015, Stem cell reports.

[40]  Christian L. Müller,et al.  Fused Regression for Multi-source Gene Regulatory Network Inference , 2016, bioRxiv.

[41]  I. Simon,et al.  Backup in gene regulatory networks explains differences between binding and knockout results , 2009, Molecular systems biology.

[42]  William Stafford Noble,et al.  FIMO: scanning for occurrences of a given motif , 2011, Bioinform..

[43]  D. Higgins,et al.  Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega , 2011, Molecular systems biology.

[44]  Martha L. Bulyk,et al.  UniPROBE, update 2011: expanded content and search tools in the online database of protein-binding microarray data on protein–DNA interactions , 2010, Nucleic Acids Res..

[45]  William Stafford Noble,et al.  Quantifying similarity between motifs , 2007, Genome Biology.

[46]  A. Boulesteix,et al.  Predicting transcription factor activities from combined analysis of microarray and ChIP data: a partial least squares approach , 2005, Theoretical Biology and Medical Modelling.

[47]  M. Gerstein,et al.  Structure and evolution of transcriptional regulatory networks. , 2004, Current opinion in structural biology.

[48]  A. Teleman,et al.  Nutritional control of protein biosynthetic capacity by insulin via Myc in Drosophila. , 2008, Cell metabolism.

[49]  Katy C. Kao,et al.  Transcriptome-based determination of multiple transcription regulator activities in Escherichia coli by using network component analysis. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[50]  J. Collins,et al.  Large-Scale Mapping and Validation of Escherichia coli Transcriptional Regulation from a Compendium of Expression Profiles , 2007, PLoS biology.

[51]  Nicola J. Rinaldi,et al.  Transcriptional regulatory code of a eukaryotic genome , 2004, Nature.

[52]  Raluca Gordân,et al.  Curated collection of yeast transcription factor DNA binding specificity data reveals novel structural and gene regulatory insights , 2011, Genome Biology.