XcisClique: analysis of regulatory bicliques

BackgroundModeling of cis-elements or regulatory motifs in promoter (upstream) regions of genes is a challenging computational problem. In this work, set of regulatory motifs simultaneously present in the promoters of a set of genes is modeled as a biclique in a suitably defined bipartite graph. A biologically meaningful co-occurrence of multiple cis-elements in a gene promoter is assessed by the combined analysis of genomic and gene expression data. Greater statistical significance is associated with a set of genes that shares a common set of regulatory motifs, while simultaneously exhibiting highly correlated gene expression under given experimental conditions.MethodsXcisClique, the system developed in this work, is a comprehensive infrastructure that associates annotated genome and gene expression data, models known cis-elements as regular expressions, identifies maximal bicliques in a bipartite gene-motif graph; and ranks bicliques based on their computed statistical significance. Significance is a function of the probability of occurrence of those motifs in a biclique (a hypergeometric distribution), and on the new sum of absolute values statistic (SAV) that uses Spearman correlations of gene expression vectors. SAV is a statistic well-suited for this purpose as described in the discussion.ResultsXcisClique identifies new motif and gene combinations that might indicate as yet unidentified involvement of sets of genes in biological functions and processes. It currently supports Arabidopsis thaliana and can be adapted to other organisms, assuming the existence of annotated genomic sequences, suitable gene expression data, and identified regulatory motifs. A subset of Xcis Clique functionalities, including the motif visualization component MotifSee, source code, and supplementary material are available at https://bioinformatics.cs.vt.edu/xcisclique/.

[1]  Amrita Pati,et al.  Modeling and Analysis of Regulatory Elements in Arabidopsis thaliana from Annotated Genomes and Gene Expression Data , 2005 .

[2]  Shimon Gepstein,et al.  Large-scale identification of leaf senescence-associated genes. , 2003, The Plant journal : for cell and molecular biology.

[3]  Chun Zhang,et al.  Storing and querying ordered XML using a relational database system , 2002, SIGMOD '02.

[4]  G. Church,et al.  Identifying regulatory networks by combinatorial analysis of promoter elements , 2001, Nature Genetics.

[5]  Gaston H. Gonnet,et al.  Scoring functions for transcription factor binding site prediction , 2005, BMC Bioinformatics.

[6]  K. Shinozaki,et al.  DNA-binding specificity of the ERF/AP2 domain of Arabidopsis DREBs, transcription factors involved in dehydration- and cold-inducible gene expression. , 2002, Biochemical and biophysical research communications.

[7]  David Wheeler,et al.  Building Customized Data Pipelines Using the Entrez Programming Utilities (eUtils) , 2004 .

[8]  Yoshihiro Ugawa,et al.  Plant cis-acting regulatory DNA elements (PLACE) database: 1999 , 1999, Nucleic Acids Res..

[9]  Wilhelm Gruissem,et al.  Biochemistry & Molecular Biology of Plants , 2002 .

[10]  Ramesh Raina,et al.  Characterizing the stress/defense transcriptome of Arabidopsis , 2003, Genome Biology.

[11]  K. Shinozaki,et al.  Important roles of drought- and cold-inducible genes for galactinol synthase in stress tolerance in Arabidopsis thaliana. , 2002, The Plant journal : for cell and molecular biology.

[12]  T. Werner Models for prediction and recognition of eukaryotic promoters , 1999, Mammalian Genome.

[13]  Chunhong Chen,et al.  Evidence for an Important Role of WRKY DNA Binding Proteins in the Regulation of NPR1 Gene Expression , 2001, The Plant Cell Online.

[14]  Yves Meyer,et al.  The Arabidopsis Cytosolic Thioredoxin h5 Gene Induction by Oxidative Stress and Its W-Box-Mediated Response to Pathogen Elicitor1 , 2004, Plant Physiology.

[15]  A. Cornish-Bowden Nomenclature for incompletely specified bases in nucleic acid sequences: recommendations 1984. , 1985, Nucleic acids research.

[16]  Alan M. Moses,et al.  Phylogenetically and spatially conserved word pairs associated with gene-expression changes in yeasts , 2003, RECOMB '03.

[17]  William Stafford Noble,et al.  Assessing computational tools for the discovery of transcription factor binding sites , 2005, Nature Biotechnology.

[18]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[19]  Kathleen Marchal,et al.  Computational Approaches to Identify Promoters and cis-Regulatory Elements in Plant Genomes1 , 2003, Plant Physiology.

[20]  Michael F. Thomashow,et al.  Low Temperature Induction of Arabidopsis CBF1, 2, and 3 Is Gated by the Circadian Clock1 , 2005, Plant Physiology.

[21]  Dorothee Staiger,et al.  Ultraviolet-B Radiation-Mediated Responses in Plants. Balancing Damage and Protection1 , 2003, Plant Physiology.

[22]  M. Tompa,et al.  Discovery of novel transcription factor binding sites by statistical overrepresentation. , 2002, Nucleic acids research.

[23]  Michael Q. Zhang,et al.  Identifying combinatorial regulation of transcription factors and binding motifs , 2004, Genome Biology.

[24]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[25]  M. Thomashow,et al.  Roles of the CBF2 and ZAT12 transcription factors in configuring the low temperature transcriptome of Arabidopsis. , 2004, The Plant journal : for cell and molecular biology.

[26]  B A Halkier,et al.  Rapid stimulation of a soybean protein‐serine kinase that phosphorylates a novel bZIP DNA‐binding protein, G/HBF‐1, during the induction of early transcription‐dependent defenses , 1997, The EMBO journal.

[27]  B. Winkel-Shirley,et al.  Flavonoid biosynthesis. A colorful model for genetics, biochemistry, cell biology, and biotechnology. , 2001, Plant physiology.

[28]  Sushil Jajodia,et al.  Proceedings of the 1993 ACM SIGMOD international conference on Management of data , 1993, SIGMOD 1993.

[29]  L. Kedes,et al.  Nomenclature for incompletely specified bases in nucleic acid sequences. Recommendations 1984. Nomenclature Committee of the International Union of Biochemistry (NC-IUB). , 1986, Proceedings of the National Academy of Sciences of the United States of America.

[30]  Matthew E. Hudson,et al.  Identification of Promoter Motifs Involved in the Network of Phytochrome A-Regulated Gene Expression by Combined Analysis of Genomic Sequence and Microarray Data1[w] , 2003, Plant Physiology.

[31]  Q. Shen,et al.  Functional dissection of an abscisic acid (ABA)-inducible gene reveals two independent ABA-responsive complexes each containing a G-box and a novel cis-acting element. , 1995, The Plant cell.

[32]  Nina Johansson,et al.  Heat Shock Element Architecture Is an Important Determinant in the Temperature and Transactivation Domain Requirements for Heat Shock Transcription Factor , 1998, Molecular and Cellular Biology.

[33]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[34]  M. Thomashow,et al.  Arabidopsis Transcriptome Profiling Indicates That Multiple Regulatory Pathways Are Activated during Cold Acclimation in Addition to the CBF Cold Response Pathway Online version contains Web-only data. Article, publication date, and citation information can be found at www.plantcell.org/cgi/doi/10.1 , 2002, The Plant Cell Online.

[35]  J. Fickett,et al.  Eukaryotic promoter recognition. , 1997, Genome research.

[36]  John D. Storey,et al.  Statistical significance for genomewide studies , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[37]  Lin Sun,et al.  A Myb-related transcription factor is involved in the phytochrome regulation of an Arabidopsis Lhcb gene. , 1997, The Plant cell.

[38]  T. Salem,et al.  Plant class B HSFs inhibit transcription and exhibit affinity for TFIIB and TBP , 2004, Plant Molecular Biology.

[39]  T. Thomas,et al.  Isolation of a novel class of bZIP transcription factors that interact with ABA-responsive and embryo-specification elements in the Dc3 promoter using a modified yeast one-hybrid system. , 1997, The Plant journal : for cell and molecular biology.

[40]  Bin Li,et al.  Limitations and potentials of current motif discovery algorithms , 2005, Nucleic acids research.

[41]  Jun S. Liu,et al.  De novo cis-regulatory module elicitation for eukaryotic genomes. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[42]  J. Collado-Vides,et al.  Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies. , 1998, Journal of molecular biology.

[43]  Toshihisa Takagi,et al.  Predicting rules on organization of cis-regulatory elements, taking the order of elements into account , 2004, Bioinform..

[44]  I. Somssich,et al.  Interaction of elicitor‐induced DNA‐binding proteins with elicitor response elements in the promoters of parsley PR1 genes. , 1996, The EMBO journal.

[45]  Jian-Kang Zhu,et al.  Salt and drought stress signal transduction in plants. , 2002, Annual review of plant biology.

[46]  M. Thomashow,et al.  The 5′-region of Arabidopsis thaliana cor15a has cis-acting elements that confer cold-, drought- and ABA-regulated gene expression , 1994, Plant Molecular Biology.