LocalMotif - An In-Silico Tool for Detecting Localized Motifs in Regulatory Sequences

In silico motif finding algorithms are often used for discovering protein-DNA binding sites in a set of regulatory sequences. Current algorithms mainly address motif discovery in short sequences. Analyzing long sequences can be quite challenging not only due to increasing time and memory requirements of the algorithm, but also decreasing accuracy. However, in case the motif is localized in a short interval of the long sequences relative to an anchor point, it is tenable to detect it easily by restricting the search to that interval. But the region of localization of the motif is not known a priori. This paper reports an algorithm called LocalMotif to detect localized motifs in long regulatory sequences. A novel score function predicts the region of localization of the motif. This score is combined with other scoring measures including Z-score and relative entropy to detect the motif. The algorithm is optimized for fast processing of long regulatory sequences. Tests on simulated and real datasets confirm that LocalMotif accurately determines the region of localization of motifs and automatically discovers the biologically relevant motifs, which can be detected by other motif finding algorithms only when the search is restricted to the relevant interval

[1]  William Stafford Noble,et al.  Assessing computational tools for the discovery of transcription factor binding sites , 2005, Nature Biotechnology.

[2]  E. Davidson,et al.  The hardwiring of development: organization and function of genomic regulatory systems. , 1997, Development.

[3]  J. Rice Mathematical Statistics and Data Analysis , 1988 .

[4]  D. Zhang,et al.  Localization of DNA protein-binding sites in the proximal and distal promoter regions of the mouse alpha-fetoprotein gene. , 1990, The Journal of biological chemistry.

[5]  Charles Elkan,et al.  Fitting a Mixture Model By Expectation Maximization To Discover Motifs In Biopolymer , 1994, ISMB.

[6]  Martin Tompa,et al.  An Exact Method for Finding Short Motifs in Sequences, with Application to the Ribosome Binding Site Problem , 1999, ISMB.

[7]  Pavel A. Pevzner,et al.  Combinatorial Approaches to Finding Subtle Signals in DNA Sequences , 2000, ISMB.

[8]  Kathleen Marchal,et al.  A Gibbs sampling method to detect over-represented motifs in the upstream regions of co-expressed genes , 2001, RECOMB.

[9]  Uri Keich,et al.  Finding motifs in the twilight zone , 2002, RECOMB '02.

[10]  S Sarafova,et al.  Precise arrangement of factor-binding sites is required for murine CD4 promoter function. , 2000, Nucleic acids research.

[11]  G. Church,et al.  Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitation , 1998, Nature Biotechnology.

[12]  Marie-France Sagot,et al.  Algorithms for Extracting Structured Motifs Using a Suffix Tree with an Application to Promoter and Regulatory Site Consensus Identification , 2000, J. Comput. Biol..

[13]  C. Harley,et al.  Analysis of E. coli promoter sequences. , 1987, Nucleic acids research.

[14]  Uri Keich,et al.  U Subtle motifs: defining the limits of motif finding algorithms , 2002, Bioinform..

[15]  Gary D. Stormo,et al.  Identifying DNA and protein patterns with statistically significant alignments of multiple sequences , 1999, Bioinform..

[16]  Clifford A. Meyer,et al.  Chromosome-Wide Mapping of Estrogen Receptor Binding Reveals Long-Range Regulation Requiring the Forkhead Protein FoxA1 , 2005, Cell.

[17]  G. Rubin,et al.  Computational analysis of core promoters in the Drosophila genome , 2002, Genome Biology.

[18]  Graziano Pesole,et al.  Weeder Web: discovery of transcription factor binding sites in a set of sequences from co-regulated genes , 2004, Nucleic Acids Res..

[19]  Jeremy Buhler,et al.  Finding motifs using random projections , 2001, RECOMB.

[20]  S. Henikoff,et al.  Automated construction and graphical presentation of protein blocks from unaligned sequences. , 1995, Gene.

[21]  Jun S. Liu,et al.  Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. , 1993, Science.

[22]  Eleazar Eskin,et al.  Finding composite regulatory patterns in DNA sequences , 2002, ISMB.

[23]  Douglas L. Brutlag,et al.  BioProspector: Discovering Conserved DNA Motifs in Upstream Regulatory Regions of Co-Expressed Genes , 2000, Pacific Symposium on Biocomputing.