A Generalized Hidden Markov Model for Prediction of Cis-regulatory Modules in Eukaryote Genomes and Description of Their Internal Structure

Eukaryotic regulatory regions have been studied extensively due to their importance for gene regulation in higher eukaryotes. However, the understanding of their organization is clearly incomplete. In particular, we lack accurate in silico methods for their prediction. Here we present a new HMM-based method for the prediction of regulatory regions in eukaryotic genomes using position weight matrices of the relevant transcription factors. The method reveals and then utilizes the regulatory region structure (preferred binding site arrangements) to increase the quality of the prediction, as well as to provide a new knowledge of the regulatory region organization. We show that our method can be successfully used for the identification of regulatory regions in eukaryotic genomes with a quality higher than that of other methods. We also demonstrate the ability of our algorithm to reveal structural features of the regulatory regions, which could be helpful for the deciphering of the transcriptional regulation mechanisms in higher eukaryotes.

[1]  W. Wong,et al.  CisModule: de novo discovery of cis-regulatory modules by hierarchical mixture modeling. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Alexander E. Kel,et al.  TRANSFAC® and its module TRANSCompel®: transcriptional gene regulation in eukaryotes , 2005, Nucleic Acids Res..

[3]  Jianfei Hu,et al.  MOPAT: a graph-based method to predict recurrent cis-regulatory modules from known motifs , 2008, Nucleic acids research.

[4]  Anna G. Nazina,et al.  Distance preferences in the arrangement of binding motifs and hierarchical levels in organization of transcription regulatory information. , 2003, Nucleic acids research.

[5]  J. Fickett,et al.  Identification of regulatory regions which confer muscle-specific gene expression. , 1998, Journal of molecular biology.

[6]  Wyeth W. Wasserman,et al.  Identification of functional clusters of transcription factor binding motifs in genome sequences: the MSCAN algorithm , 2003, ISMB.

[7]  John Reinitz,et al.  Bicoid cooperative DNA binding is critical for embryonic patterning in Drosophila. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Dmitri Papatsenko,et al.  Organization of developmental enhancers in the Drosophila embryo , 2009, Nucleic acids research.

[9]  Martin C. Frith,et al.  Cluster-Buster: finding dense clusters of motifs in DNA sequences , 2003, Nucleic Acids Res..

[10]  Finn Drabløs,et al.  Assessment of composite motif discovery methods , 2008, BMC Bioinformatics.

[11]  Piero Fariselli,et al.  A new decoding algorithm for hidden Markov models improves the prediction of the topology of all-beta membrane proteins , 2005, BMC Bioinformatics.

[12]  Alexander E. Kel,et al.  Composite Module Analyst: a fitness-based tool for identification of transcription factor binding site combinations , 2006, Bioinform..

[13]  Bart De Moor,et al.  Computational detection of cis-regulatory modules , 2003, ECCB.

[14]  A. Stewart,et al.  TEF-1 and MEF2 transcription factors interact to regulate muscle-specific promoters. , 2002, Biochemical and biophysical research communications.

[15]  Saurabh Sinha,et al.  A probabilistic method to detect regulatory modules , 2003, ISMB.

[16]  Martin C. Frith,et al.  Detection of cis -element clusters in higher eukaryotic DNA , 2001, Bioinform..

[17]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[18]  William Stafford Noble,et al.  Searching for statistically significant regulatory modules , 2003, ECCB.

[19]  Steven M. Gallo,et al.  REDfly 2.0: an integrated database of cis-regulatory modules and transcription factor binding sites in Drosophila , 2007, Nucleic Acids Res..

[20]  E. Ukkonen,et al.  Genome-wide Prediction of Mammalian Enhancers Based on Analysis of Transcription-Factor Binding Affinity , 2006, Cell.