Algorithms for pattern matching and discovery in RNA secondary structure

Text-indexing structures provide significant advantages in the solution of many problems related to string analysis and comparison, and are nowadays widely used in the analysis of biological sequences. In this paper, we present some applications of affix trees to problems of exact and approximate pattern matching and discovery in RNA sequences. By allowing bidirectional search for symmetric patterns in the sequences, affix trees permit to discover and locate in the sequences patterns describing not only sequence regions, but also containing information about the secondary structure that a given region could form, with improvements in terms of theoretical and practical efficiency over the existing methods. The search can be either exact or approximate, where the approximation can be defined simultaneously both for the sequence and the structure of patterns. The approach presented in this paper could provide significant help in the analysis of RNA sequences, where the functional motifs often involve not only sequence, but also the structural constraints.

[1]  Gary D. Stormo,et al.  Identifying DNA and protein patterns with statistically significant alignments of multiple sequences , 1999, Bioinform..

[2]  S. Eddy Computational Genomics of Noncoding RNA Genes , 2002, Cell.

[3]  N. Gray,et al.  Control of translation initiation in animals. , 1998, Annual review of cell and developmental biology.

[4]  Yuh-Jyh Hu Prediction of consensus structural motifs in a family of coregulated RNA sequences. , 2002, Nucleic acids research.

[5]  Gary D. Stormo,et al.  Finding Common Sequence and Structure Motifs in a Set of RNA Sequences , 1997, ISMB.

[6]  D Gautheret,et al.  Novel Selenoproteins Identified in Silico andin Vivo by Using a Conserved RNA Structural Motif* , 1999, The Journal of Biological Chemistry.

[7]  C. Woese,et al.  5S RNA secondary structure , 1975, Nature.

[8]  D. Ecker,et al.  RNAMotif, an RNA secondary structure definition and search algorithm. , 2001, Nucleic acids research.

[9]  Graziano Pesole,et al.  UTRdb and UTRsite: specialized databases of sequences and functional elements of 5' and 3' untranslated regions of eukaryotic mRNAs. Update 2002 , 2002, Nucleic Acids Res..

[10]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[11]  W. Marzluff,et al.  The sequence of the stem and flanking sequences at the 3' end of histone mRNA are critical determinants for the binding of the stem-loop binding protein. , 1995, Nucleic acids research.

[12]  Henry Soldano,et al.  A new method to predict the consensus secondary structure of a set of unaligned RNA sequences , 1999, Bioinform..

[13]  P. Stadler,et al.  Conserved RNA secondary structures in Picornaviridae genomes. , 2001, Nucleic acids research.

[14]  Dan Gusfield Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[15]  M. Hentze,et al.  Molecular control of vertebrate iron metabolism: mRNA-based regulatory circuits operated by iron, nitric oxide, and oxidative stress. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Gary D. Stormo,et al.  Identification of consensus patterns in unaligned DNA sequences known to be functionally related , 1990, Comput. Appl. Biosci..

[17]  Mireille Régnier,et al.  Automatic RNA Secondary Structure Prediction with a Comparative Approach , 2002, Comput. Chem..

[18]  Jih-Hsiang Chen,et al.  Discovering well-ordered folding patterns in nucleotide sequences , 2003, Bioinform..

[19]  Moritz G. Maaß Linear Bidirectional On-Line Construction of Affix Trees , 2000, CPM.

[20]  Daniel Gautheret,et al.  An RNA pattern matching program with enhanced performance and portability , 1994, Comput. Appl. Biosci..

[21]  J. Sabina,et al.  Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. , 1999, Journal of molecular biology.

[22]  R. Simons,et al.  RNA structure and function , 1998 .

[23]  Graziano Pesole,et al.  PatSearch: a pattern matcher software that finds functional elements in nucleotide and protein sequences and assesses their statistical significance , 2000, Bioinform..

[24]  M. Hentze,et al.  Finding the hairpin in the haystack: searching for RNA motifs. , 1995, Trends in genetics : TIG.

[25]  Wei-Min Liu,et al.  A data mining approach to discover unusual folding regions in genome sequences , 2002, Knowl. Based Syst..

[26]  B. Ganem RNA world , 1987, Nature.

[27]  Simon Kasif,et al.  On the normalization of RNA equilibrium free energy to the length of the sequence. , 2003, Nucleic acids research.

[28]  P. Carbon,et al.  Structural analysis of new local features in SECIS RNA hairpins. , 2000, Nucleic acids research.

[29]  V. M. Pain Initiation of protein synthesis in eukaryotic cells. , 1996, European journal of biochemistry.

[30]  G. Kryukov,et al.  New Mammalian Selenocysteine-containing Proteins Identified with an Algorithm That Searches for Selenocysteine Insertion Sequence Elements* , 1999, The Journal of Biological Chemistry.

[31]  G. Stormo,et al.  Discovering common stem-loop motifs in unaligned RNA sequences. , 2001, Nucleic acids research.

[32]  J. Parsch,et al.  Comparative sequence analysis and patterns of covariation in RNA secondary structures. , 2000, Genetics.

[33]  V. W. Porto,et al.  Discovery of RNA structural elements using evolutionary computation. , 2002, Nucleic acids research.

[34]  A. E. Walter,et al.  Coaxial stacking of helixes enhances binding of oligoribonucleotides and improves predictions of RNA folding. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[35]  K. Dill,et al.  RNA folding energy landscapes. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[36]  Giancarlo Mauri,et al.  Pattern Discovery in RNA Secondary Structure Using Affix Trees , 2003, CPM.

[37]  Walter Fontana,et al.  Fast folding and comparison of RNA secondary structures , 1994 .

[38]  Christian Zwieb,et al.  SRPDB: Signal Recognition Particle Database , 2003, Nucleic Acids Res..

[39]  Graziano Pesole,et al.  UTRdb and UTRsite: specialized databases of sequences and functional elements of 5' and 3' untranslated regions of eukaryotic mRNAs , 2000, Nucleic Acids Res..

[40]  Laurie J. Heyer,et al.  Finding the most significant common sequence and structure motifs in a set of RNA sequences. , 1997, Nucleic acids research.

[41]  H. Lütcke Signal recognition particle (SRP), a ubiquitous initiator of protein translocation. , 1995, European journal of biochemistry.

[42]  Elena Rivas,et al.  Secondary structure alone is generally not statistically significant for the detection of noncoding RNAs , 2000, Bioinform..

[43]  J. van Leeuwen,et al.  Theoretical Computer Science , 2003, Lecture Notes in Computer Science.

[44]  Christian Zwieb,et al.  SRPDB (Signal Recognition Particle Database) , 2001, Nucleic Acids Res..

[45]  R. Guigó,et al.  In silico identification of novel selenoproteins in the Drosophila melanogaster genome , 2001, EMBO reports.