论文信息 - Suffix Tree Characterization of Maximal Motifs in Biological Sequences

Suffix Tree Characterization of Maximal Motifs in Biological Sequences

Finding motifs in biological sequences is one of the most intriguing problems for string algorithms designers as it is necessary to deal with approximations and this complicates the problem. Existing algorithms run in time linear with the input size. Nevertheless, the output size can be very large due to the approximation. This makes the output often unreadable, next to slowing down the inference itself. Since only a subset of the motifs, i.e. the maximal motifs, could be enough to give the information of all of them, in this paper, we aim at removing such redundancy. We define notions of maximality that we characterize in the suffix tree data structure. Given that this is used by a whole class of motifs extraction tools, we show how these tools can be modified to include the maximality requirement on the fly without changing the asymptotical complexity.

Nadia Pisanti | Maria Federico

[1] Alain Viari,et al. Searching for flexible repeated patterns using a non-transitive similarity relation , 1995, Pattern Recognit. Lett..

[2] Esko Ukkonen. Structural Analysis of Gapped Motifs of a String , 2007, MFCS.

[3] Marie-France Sagot,et al. An efficient algorithm for the identification of structured motifs in DNA promoter sequences , 2006, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[4] M. Sagot,et al. Promoter sequences and algorithmical methods for identifying them. , 1999, Research in microbiology.

[5] Peter Weiner,et al. Linear Pattern Matching Algorithms , 1973, SWAT.

[6] Marie-France Sagot,et al. RISOTTO: Fast Extraction of Motifs with Mismatches , 2006, LATIN.

[7] Esko Ukkonen,et al. On-line construction of suffix trees , 1995, Algorithmica.

[8] Marie-France Sagot,et al. A highly scalable algorithm for the extraction of CIS-regulatory regions , 2005, APBC.

[9] Marie-France Sagot,et al. Algorithms for Extracting Structured Motifs Using a Suffix Tree with an Application to Promoter and Regulatory Site Consensus Identification , 2000, J. Comput. Biol..

[10] M. Sagot,et al. Inferring regulatory elements from a whole genome. An analysis of Helicobacter pylori sigma(80) family of promoter signals. , 2000, Journal of molecular biology.

[11] Gregory Kucherov,et al. Finding Approximate Repetitions under Hamming Distance , 2001, ESA.