Identification of Over-Represented Combinations of Transcription Factor Binding Sites in Sets of Co-Expressed Genes

Transcription regulation is mediated by combinatorial interactions between diverse trans-acting proteins and arrays of cis-regulatory sequences. Revealing this complex interplay between transcription factors and binding sites remains a fundamental problem for understanding the flow of genetic information. The oPOSSUM analysis system facilitates the interpretation of gene expression data through the analysis of transcription factor binding sites shared by sets of co-expressed genes. The system is based on cross-species sequence comparisons for phylogenetic footprinting and motif models for binding site prediction. We introduce a new set of analysis algorithms for the study of the combinatorial properties of transcription factor binding sites shared by sets of co-expressed genes. The new methods circumvent computational challenges through an applied focus on families of transcription factors with similar binding properties. The algorithm accurately identifies combinations of binding sites over-represented in reference collections and clarifies the results obtained by existing methods for the study of isolated binding sites.

[1]  Voichita D. Marinescu,et al.  Expression profiling and identification of novel genes involved in myogenic differentiation , 2004, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[2]  H. Herzel,et al.  Inferring combinatorial regulation of transcription in silico , 2005, Nucleic acids research.

[3]  G. Church,et al.  Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae. , 2000, Journal of molecular biology.

[4]  E. Davidson,et al.  The hardwiring of development: organization and function of genomic regulatory systems. , 1997, Development.

[5]  K. Nakai,et al.  Genome-wide analysis reveals strong correlation between CpG islands with nearby transcription start sites of genes and their tissue specificity. , 2005, Gene.

[6]  Roded Sharan,et al.  CREME: Cis-Regulatory Module Explorer for the human genome , 2004, Nucleic Acids Res..

[7]  A. Sandelin,et al.  Applied bioinformatics for the identification of regulatory elements , 2004, Nature Reviews Genetics.

[8]  G. Owens,et al.  Expression of the Smooth Muscle Myosin Heavy Chain Gene Is Regulated by a Negative-acting GC-rich Element Located between Two Positive-acting Serum Response Factor-binding Elements* , 1997, The Journal of Biological Chemistry.

[9]  Gabriel Kreiman,et al.  Identification of sparsely distributed clusters of cis-regulatory elements in sets of co-expressed genes. , 2004, Nucleic acids research.

[10]  David J. Arenillas,et al.  oPOSSUM: identification of over-represented transcription factor binding sites in co-expressed genes , 2005, Nucleic acids research.

[11]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[12]  A. Sandelin,et al.  Identification of conserved regulatory elements by comparative genome analysis , 2003, Journal of biology.

[13]  J. Fickett,et al.  Identification of regulatory regions which confer muscle-specific gene expression. , 1998, Journal of molecular biology.

[14]  Wyeth W. Wasserman,et al.  JASPAR: an open-access database for eukaryotic transcription factor binding profiles , 2004, Nucleic Acids Res..

[15]  Petter Mostad,et al.  Prediction of cell type-specific gene modules: identification and initial characterization of a core set of smooth muscle-specific genes. , 2003, Genome research.

[16]  Yizheng Li,et al.  Gene expression changes during mouse skeletal myoblast differentiation revealed by transcriptional profiling. , 2002, Physiological genomics.

[17]  A. Sandelin,et al.  Integrated analysis of yeast regulatory sequences for biologically linked clusters of genes , 2003, Functional & Integrative Genomics.