Genome-wide Analysis of Functions Regulated by Sets of Transcription Factors

We present a pipeline for inferring biological functions regulated by a combinatorial interaction of transcription factors. Using a robust statistical method the pipeline intersects the presence of transcription factor binding sites in gene upstream sequences with Gene Ontology terms associated with these genes. Positional frequency matrices for the transcription factors constitute the input of the pipeline and significantly enriched biological processes are reported as the output. We demonstrate the usage of the pipeline using two groups of transcription factors: a cell-cycle related family of E2F factors and a NFAT/AP-1 pair involved in immune response. In both cases the reported results match well the experimental knowledge. Furthermore, for the NFAT/AP-1 composite element novel functions are predicted.

[1]  T. Werner,et al.  Regulatory context is a crucial part of gene function. , 2002, Trends in genetics : TIG.

[2]  T. Andrews,et al.  The Ensembl automatic gene annotation system. , 2004, Genome research.

[3]  Nils Blüthgen,et al.  Biological profiling of gene groups utilizing Gene Ontology. , 2004, Genome informatics. International Conference on Genome Informatics.

[4]  T. Heinemeyer,et al.  Databases on transcriptional regulation : TRANSFAC , TRRD and COMPEL , 1997 .

[5]  Alexander E. Kel,et al.  Automatic Annotation of Genomic Regulatory Sequences by Searching for Composite Clusters , 2001, Pacific Symposium on Biocomputing.

[6]  J. Fickett,et al.  Identification of regulatory regions which confer muscle-specific gene expression. , 1998, Journal of molecular biology.

[7]  W. Wasserman,et al.  A predictive model for regulatory sequences directing liver-specific transcription. , 2001, Genome research.

[8]  Philipp Bucher,et al.  The Eukaryotic Promoter Database EPD: the impact of in silico primer extension , 2004, Nucleic Acids Res..

[9]  G. Stormo Information content and free energy in DNA--protein interactions. , 1998, Journal of theoretical biology.

[10]  Kenta Nakai,et al.  BTSS, DataBase of Transcriptional Start Sites: progress report 2004 , 2004, Nucleic Acids Res..

[11]  E. Wingender,et al.  Recognition of NFATp/AP-1 composite elements within genes induced upon the activation of immune cells. , 1999, Journal of molecular biology.

[12]  M. Strauss,et al.  The retinoblastoma protein: a master regulator of cell cycle, differentiation and apoptosis. , 1997, European journal of biochemistry.

[13]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[14]  Martin Vingron,et al.  CORG: a database for COmparative Regulatory Genomics , 2003, Nucleic Acids Res..

[15]  Martin C. Frith,et al.  Cluster-Buster: finding dense clusters of motifs in DNA sequences , 2003, Nucleic Acids Res..

[16]  Nils Blüthgen,et al.  HOMGL - comparing genelists across species and with different accession numbers , 2004, Bioinform..

[17]  Wyeth W. Wasserman,et al.  In silico identification of metazoan transcriptional regulatory regions , 2003, Naturwissenschaften.