PASTAA: identifying transcription factors associated with sets of co-regulated genes

Motivation: A major challenge in regulatory genomics is the identification of associations between functional categories of genes (e.g. tissues, metabolic pathways) and their regulating transcription factors (TFs). While, for a limited number of categories, the regulating TFs are already known, still for many functional categories the responsible factors remain to be elucidated. Results: We put forward a novel method (PASTAA) for detecting transcriptions factors associated with functional categories, which utilizes the prediction of binding affinities of a TF to promoters. This binding strength information is compared to the likelihood of membership of the corresponding genes in the functional category under study. Coherence between the two ranked datasets is seen as an indicator of association between a TF and the category. PASTAA is applied primarily to the determination of TFs driving tissue-specific expression. We show that PASTAA is capable of recovering many TFs acting tissue specifically and, in addition, provides novel associations so far not detected by alternative methods. The application of PASTAA to detect TFs involved in the regulation of tissue-specific gene expression revealed a remarkable number of experimentally supported associations. The validated success for various datasets implies that PASTAA can directly be applied for the detection of TFs associated with newly derived gene sets. Availability: The PASTAA source code as well as a corresponding web interface is freely available at http://trap.molgen.mpg.de Contact: roider@molgen.mpg.de Supplementary information: Supplementary data are available at Bioinformatics online.

[1]  R. Young,et al.  Rapid analysis of the DNA-binding specificities of transcription factors with DNA microarrays , 2004, Nature Genetics.

[2]  Erin K O'Shea,et al.  Partially Phosphorylated Pho4 Activates Transcription of a Subset of Phosphate-Responsive Genes , 2003, PLoS biology.

[3]  Martha L. Bulyk,et al.  Meta-analysis discovery of tissue-specific DNA sequence motifs from mammalian gene expression data , 2006, BMC Bioinformatics.

[4]  Alexander E. Kel,et al.  TRANSFAC® and its module TRANSCompel®: transcriptional gene regulation in eukaryotes , 2005, Nucleic Acids Res..

[5]  Larry W. Swanson,et al.  Dwarf locus mutants lacking three pituitary cell types result from mutations in the POU-domain gene pit-1 , 1990, Nature.

[6]  J. Blake,et al.  Extension and integration of the gene ontology (GO): combining GO vocabularies with external vocabularies. , 2002, Genome research.

[7]  D. Zack,et al.  Computational analysis of tissue-specific combinatorial gene regulation: predicting interaction between transcription factors in human tissues , 2006, Nucleic acids research.

[8]  J. Collado-Vides,et al.  Discovering regulatory elements in non-coding sequences by analysis of spaced dyads. , 2000, Nucleic acids research.

[9]  Martin Vingron,et al.  On the Power of Profiles for Transcription Factor Binding Site Detection , 2003, Statistical applications in genetics and molecular biology.

[10]  Michael Q. Zhang,et al.  Mining ChIP-chip data for transcription factor and cofactor binding sites , 2005, ISMB.

[11]  S. Johnston,et al.  Interaction between transcriptional activator protein LAC9 and negative regulatory protein GAL80 , 1989, Molecular and cellular biology.

[12]  M. Kanehisa,et al.  Using the KEGG Database Resource , 2005, Current protocols in bioinformatics.

[13]  Michael Q. Zhang,et al.  DNA motifs in human and mouse proximal promoters predict tissue-specific expression. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[14]  M Vingron,et al.  GeneNest: automated generation and visualization of gene indices. , 2000, Trends in genetics : TIG.

[15]  Charles Elkan,et al.  The Value of Prior Knowledge in Discovering Motifs with MEME , 1995, ISMB.

[16]  Jonathan P. Katz,et al.  Inactivation of the winged helix transcription factor HNF3alpha affects glucose homeostasis and islet glucagon gene expression in vivo. , 1999, Genes & development.

[17]  P. Minoo,et al.  Thyroid-specific enhancer-binding protein/thyroid transcription factor 1 is not required for the initial specification of the thyroid and lung primordia. , 1999, Biochimie.

[18]  C. Ucla,et al.  RFX1, a transactivator of hepatitis B virus enhancer I, belongs to a novel family of homodimeric and heterodimeric DNA-binding proteins , 1994, Molecular and cellular biology.

[19]  J. Fickett,et al.  Identification of regulatory regions which confer muscle-specific gene expression. , 1998, Journal of molecular biology.

[20]  Wyeth W. Wasserman,et al.  JASPAR: an open-access database for eukaryotic transcription factor binding profiles , 2004, Nucleic Acids Res..

[21]  Boris Lenhard,et al.  Ancora: a web resource for exploring highly conserved noncoding elements and their association with developmental regulatory genes , 2008, Genome Biology.

[22]  Martin Vingron,et al.  T-STAG: resource and web-interface for tissue-specific transcripts and genes , 2005, Nucleic Acids Res..

[23]  Gary D. Stormo,et al.  PAP: a comprehensive workbench for mammalian transcriptional regulatory sequence analysis , 2007, Nucleic Acids Res..

[24]  Z. Weng,et al.  Finding functional sequence elements by multiple local alignment. , 2004, Nucleic acids research.

[25]  古川 晶子,et al.  The mouse Crx 5'-upstream transgene sequence directs cell-specific and developmentally regulated expression in retinal photoreceptor cells , 2007 .

[26]  A. Paquette,et al.  NRSF/REST is required in vivo for repression of multiple neuronal target genes during embryogenesis , 1998, Nature Genetics.

[27]  M Strubin,et al.  The cell-specific transcription factor PTF1 contains two different subunits that interact with the DNA. , 1989, Genes & development.

[28]  Andreas Prlic,et al.  Ensembl 2006 , 2005, Nucleic Acids Res..

[29]  Zhiping Weng,et al.  Global mapping of c-Myc binding sites and target gene networks in human B cells , 2006, Proceedings of the National Academy of Sciences.

[30]  Z. Weng,et al.  Detection of functional DNA motifs via statistical over-representation. , 2004, Nucleic acids research.

[31]  Eran Segal,et al.  Systematic functional characterization of cis-regulatory motifs in human core promoters. , 2008, Genome research.

[32]  Ivan Ovcharenko,et al.  Predicting tissue-specific enhancers in the human genome. , 2006, Genome research.

[33]  C. Cepko,et al.  The Mouse Crx 5′-Upstream Transgene Sequence Directs Cell-Specific and Developmentally Regulated Expression in Retinal Photoreceptor Cells , 2002, The Journal of Neuroscience.

[34]  Shannan J. Ho Sui,et al.  oPOSSUM: integrated tools for analysis of regulatory motif over-representation , 2007, Nucleic Acids Res..

[35]  Fangxue Sherry He,et al.  Systematic identification of mammalian regulatory motifs' target genes and functions , 2008, Nature Methods.

[36]  Nicola J. Rinaldi,et al.  Control of Pancreas and Liver Gene Expression by HNF Transcription Factors , 2004, Science.

[37]  R. Urrutia,et al.  Sp1- and Krüppel-like transcription factors , 2003, Genome Biology.

[38]  M E Greenberg,et al.  Myc requires distinct E2F activities to induce S phase and apoptosis. , 2001, Molecular cell.

[39]  Nicola J. Rinaldi,et al.  Transcriptional regulatory code of a eukaryotic genome , 2004, Nature.

[40]  Martin Vingron,et al.  Predicting transcription factor affinities to DNA from a biophysical model , 2007, Bioinform..

[41]  Zohar Yakhini,et al.  Discovering Motifs in Ranked Lists of DNA Sequences , 2007, PLoS Comput. Biol..

[42]  D. Engelberg,et al.  HSF and Msn2/4p can exclusively or cooperatively activate the yeast HSP104 gene , 2002, Molecular microbiology.

[43]  H. Ohlsson,et al.  IPF1, a homeodomain‐containing transactivator of the insulin gene. , 1993, The EMBO journal.

[44]  I. Chowers,et al.  Identification of regulatory targets of tissue-specific transcription factors: application to retina-specific gene regulation , 2005, Nucleic acids research.

[45]  T. Miyake,et al.  Genome-wide Analysis of ARS (Autonomously Replicating Sequence) Binding Factor 1 (Abf1p)-mediated Transcriptional Regulation in Saccharomyces cerevisiae* , 2004, Journal of Biological Chemistry.

[46]  Martin S. Taylor,et al.  Genome-wide analysis of mammalian promoter architecture and evolution , 2006, Nature Genetics.

[47]  Rod Bremner,et al.  CHX10 Targets a Subset of Photoreceptor Genes* , 2006, Journal of Biological Chemistry.