RIP: the regulatory interaction predictor - a machine learning-based approach for predicting target genes of transcription factors

MOTIVATION Understanding transcriptional gene regulation is essential for studying cellular systems. Identifying genome-wide targets of transcription factors (TFs) provides the basis to discover the involvement of TFs and TF cooperativeness in cellular systems and pathogenesis. RESULTS We present the regulatory interaction predictor (RIP), a machine learning approach that inferred 73 923 regulatory interactions (RIs) for 301 human TFs and 11 263 target genes with considerably good quality and 4516 RIs with very high quality. The inference of RIs is independent of any specific condition. Our approach employs support vector machines (SVMs) trained on a set of experimentally proven RIs from a public repository (TRANSFAC). Features of RIs for the learning process are based on a correlation meta-analysis of 4064 gene expression profiles from 76 studies, in silico predictions of transcription factor binding sites (TFBSs) and combinations of these employing knowledge about co-regulation of genes by a common TF (TF-module). The trained SVMs were applied to infer new RIs for a large set of TFs and genes. In a case study, we employed the inferred RIs to analyze an independent microarray dataset. We identified key TFs regulating the transcriptional response upon interferon alpha stimulation of monocytes, most prominently interferon-stimulated gene factor 3 (ISGF3). Furthermore, predicted TF-modules were highly associated to their functionally related pathways. CONCLUSION Descriptors of gene expression, TFBS predictions, experimentally verified binding information and statistical combination of this enabled inferring RIs on a genome-wide scale for human genes with considerably good precision serving as a good basis for expression profiling studies. CONTACT r.koenig@dkfz.de SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

[1]  Nicola J. Rinaldi,et al.  Computational discovery of gene modules and regulatory networks , 2003, Nature Biotechnology.

[2]  Holger Karas,et al.  TRANSFAC: a database on transcription factors and their DNA binding sites , 1996, Nucleic Acids Res..

[3]  Kathleen Marchal,et al.  Module networks revisited: computational assessment and prioritization of model predictions , 2009, Bioinform..

[4]  F. Herrmann,et al.  Transcriptional activation of the macrophage colony-stimulating factor gene by IL-2 is associated with secretion of bioactive macrophage colony-stimulating factor protein by monocytes and involves activation of the transcription factor NF-kappa B. , 1993, Journal of immunology.

[5]  Saurabh Sinha,et al.  On counting position weight matrix matches in a sequence, with application to discriminative motif finding , 2006, ISMB.

[6]  Gary D. Stormo,et al.  DNA binding sites: representation and discovery , 2000, Bioinform..

[7]  Daphne Koller,et al.  Genome-wide discovery of transcriptional modules from DNA sequence and gene expression , 2003, ISMB.

[8]  I. Ho,et al.  Ets-1, a functional cofactor of T-bet, is essential for Th1 inflammatory responses , 2005, The Journal of experimental medicine.

[9]  Ole Winther,et al.  Discovery of Regulatory Elements is Improved by a Discriminatory Approach , 2009, PLoS Comput. Biol..

[10]  A. Levine,et al.  The p53 pathway: positive and negative feedback loops , 2005, Oncogene.

[11]  J. Darnell,et al.  ISGF3, the transcriptional activator induced by interferon alpha, consists of multiple interacting polypeptide chains. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[12]  D. Floreano,et al.  Revealing strengths and weaknesses of methods for gene network inference , 2010, Proceedings of the National Academy of Sciences.

[13]  A. Friedman Transcriptional control of granulocyte and monocyte development , 2007, Oncogene.

[14]  P. Paik,et al.  Amplification of IFN-α-induced STAT1 activation and inflammatory function by Syk and ITAM-containing adaptors , 2004, Nature Immunology.

[15]  T. Tatusova,et al.  Entrez Gene: gene-centered information at NCBI , 2006, Nucleic Acids Res..

[16]  Richard Bonneau,et al.  The Inferelator: an algorithm for learning parsimonious regulatory networks from systems-biology data sets de novo , 2006, Genome Biology.

[17]  J. Collins,et al.  Large-Scale Mapping and Validation of Escherichia coli Transcriptional Regulation from a Compendium of Expression Profiles , 2007, PLoS biology.

[18]  Susumu Goto,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 2000, Nucleic Acids Res..

[19]  R. Weinberg,et al.  The Biology of Cancer , 2006 .

[20]  Rainer König,et al.  Distinct transcriptional MYCN/c-MYC activities are associated with spontaneous regression or malignant progression in neuroblastomas , 2008, Genome Biology.

[21]  W. Wong,et al.  Functional annotation and network reconstruction through cross-platform integration of microarray data , 2005, Nature Biotechnology.

[22]  D. Pe’er,et al.  Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data , 2003, Nature Genetics.

[23]  R. Mantovani,et al.  Direct p53 Transcriptional Repression: In Vivo Analysis of CCAAT-Containing G2/M Promoters , 2005, Molecular and Cellular Biology.

[24]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[25]  P. Farnham Insights from genomic profiling of transcription factors , 2009, Nature Reviews Genetics.

[26]  Richard Bonneau,et al.  DREAM3: Network Inference Using Dynamic Context Likelihood of Relatedness and the Inferelator , 2010, PloS one.

[27]  Chris Wiggins,et al.  ARACNE: An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context , 2004, BMC Bioinformatics.

[28]  Mudita Singhal,et al.  Network Inference Algorithms Elucidate Nrf2 Regulation of Mouse Lung Oxidative Stress , 2008, PLoS Comput. Biol..

[29]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[30]  W. Kaelin,et al.  Molecular basis of the VHL hereditary cancer syndrome , 2002, Nature Reviews Cancer.

[31]  J. Nevins,et al.  E2Fs link the control of G1/S and G2/M transcription , 2004, The EMBO journal.

[32]  Shih-Yin Tsai,et al.  Emerging roles of E2Fs in cancer: an exit from cell cycle control , 2009, Nature Reviews Cancer.

[33]  Alexander E. Kel,et al.  TRANSFAC® and its module TRANSCompel®: transcriptional gene regulation in eukaryotes , 2005, Nucleic Acids Res..

[34]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[35]  Richard Bonneau Learning biological networks: from modules to dynamics. , 2008, Nature chemical biology.