RNA motif search with data-driven element ordering

BackgroundIn this paper, we study the problem of RNA motif search in long genomic sequences. This approach uses a combination of sequence and structure constraints to uncover new distant homologs of known functional RNAs. The problem is NP-hard and is traditionally solved by backtracking algorithms.ResultsWe have designed a new algorithm for RNA motif search and implemented a new motif search tool RNArobo. The tool enhances the RNAbob descriptor language, allowing insertions in helices, which enables better characterization of ribozymes and aptamers. A typical RNA motif consists of multiple elements and the running time of the algorithm is highly dependent on their ordering. By approaching the element ordering problem in a principled way, we demonstrate more than 100-fold speedup of the search for complex motifs compared to previously published tools.ConclusionsWe have developed a new method for RNA motif search that allows for a significant speedup of the search of complex motifs that include pseudoknots. Such speed improvements are crucial at a time when the rate of DNA sequencing outpaces growth in computing. RNArobo is available at http://compbio.fmph.uniba.sk/rnarobo.

[1]  Broňa Brejová,et al.  Discovery of RNA motifs using a computational pipeline that allows insertions in paired regions and filtering of candidate sequences. , 2012, Methods in molecular biology.

[2]  Rolf Backofen,et al.  Inferring Noncoding RNA Families and Classes by Means of Genome-Scale Structure-Based Clustering , 2007, PLoS Comput. Biol..

[3]  Ajish D George,et al.  Informatic Resources for Identifying and Annotating Structural RNA Motifs , 2009, Molecular biotechnology.

[4]  Robert D. Finn,et al.  Rfam 12.0: updates to the RNA families database , 2014, Nucleic Acids Res..

[5]  Andrej Lupták,et al.  Convergent evolution of adenosine aptamers spanning bacterial, human, and random sequences revealed by structure-based bioinformatics and genomic SELEX. , 2012, Chemistry & biology.

[6]  Michael Beckstette,et al.  Structator: fast index-based search for RNA sequence-structure patterns , 2011, BMC Bioinformatics.

[7]  Robert Cedergren,et al.  Schistosome Satellite DNA Encodes Active Hammerhead Ribozymes , 1998, Molecular and Cellular Biology.

[8]  Andrej Lupták,et al.  HDV-like self-cleaving ribozymes , 2011, RNA biology.

[9]  Sean R. Eddy,et al.  Rfam: an RNA family database , 2003, Nucleic Acids Res..

[10]  Peter F. Stadler,et al.  ViennaRNA Package 2.0 , 2011, Algorithms for Molecular Biology.

[11]  Bin Ma,et al.  A General Edit Distance between RNA Structures , 2002, J. Comput. Biol..

[12]  James M Carothers,et al.  Solution structure of an informationally complex high-affinity RNA aptamer to GTP. , 2006, RNA.

[13]  Harry F. Noller,et al.  A discontinuous hammerhead ribozyme embedded in a mammalian messenger RNA , 2008, Nature.

[14]  A. Cornish-Bowden Nomenclature for incompletely specified bases in nucleic acid sequences: recommendations 1984. , 1985, Nucleic acids research.

[15]  Andrew R. Jackson,et al.  The Genome of the Sea Urchin Strongylocentrotus purpuratus , 2006, Science.

[16]  Andrej Lupták,et al.  Widespread Occurrence of Self-Cleaving Ribozymes , 2009, Science.

[17]  Jorng-Tzong Horng,et al.  RNAMST: efficient and flexible approach for identifying RNA structural homologs , 2006, Nucleic Acids Res..

[18]  Jian Wang,et al.  The Genome Sequence of the Malaria Mosquito Anopheles gambiae , 2002, Science.

[19]  Dirk Strothmann,et al.  The affix array data structure and its applications to RNA secondary structure analysis , 2007, Theor. Comput. Sci..

[20]  Zasha Weinberg,et al.  CMfinder - a covariance model based RNA motif finding algorithm , 2006, Bioinform..

[21]  Gonzalo Navarro,et al.  Fast and Simple Character Classes and Bounded Gaps Pattern Matching, with Applications to Protein Searching , 2003, J. Comput. Biol..

[22]  R Cedergren,et al.  Hammerhead-mediated processing of satellite pDo500 family transcripts from Dolichopoda cave crickets. , 2000, Nucleic acids research.

[23]  Gerhard Steger,et al.  From alpaca to zebrafish: hammerhead ribozymes wherever you look. , 2011, RNA.

[24]  A. Viari,et al.  Palingol: a declarative programming language to describe nucleic acids' secondary structures and to scan sequence database. , 1996, Nucleic acids research.

[25]  Welch Bl THE GENERALIZATION OF ‘STUDENT'S’ PROBLEM WHEN SEVERAL DIFFERENT POPULATION VARLANCES ARE INVOLVED , 1947 .

[26]  Dominique Barth,et al.  Tree decomposition and parameterized algorithms for RNA structure-sequence alignment including tertiary interactions and pseudoknots , 2012, WABI.

[27]  Andrej Lupták,et al.  Processing and Translation Initiation of Non-long Terminal Repeat Retrotransposons by Hepatitis Delta Virus (HDV)-like Self-cleaving Ribozymes* , 2011, The Journal of Biological Chemistry.

[28]  D. Ecker,et al.  RNAMotif, an RNA secondary structure definition and search algorithm. , 2001, Nucleic acids research.

[29]  R. Lathrop The protein threading problem with sequence amino acid interaction preferences is NP-complete. , 1994, Protein engineering.

[30]  Enno Ohlebusch,et al.  Replacing suffix trees with enhanced suffix arrays , 2004, J. Discrete Algorithms.

[31]  Sean R. Eddy,et al.  Infernal 1.0: inference of RNA alignments , 2009, Bioinform..

[32]  Eric Delwart,et al.  Identification of minimal HDV-like ribozymes with unique divalent metal ion dependence in the human microbiome. , 2014, Biochemistry.

[33]  Daniel Gautheret,et al.  An RNA pattern matching program with enhanced performance and portability , 1994, Comput. Appl. Biosci..

[34]  Danny Barash,et al.  RNAPattMatch: a web server for RNA sequence/structure motif detection based on pattern matching with flexible gaps , 2015, Nucleic Acids Res..

[35]  Daniel Gautheret,et al.  Pattern searching/alignment with RNA primary and secondary structures: an effective descriptor for tRNA , 1990, Comput. Appl. Biosci..

[36]  Zasha Weinberg,et al.  Identification of Hammerhead Ribozymes in All Domains of Life Reveals Novel Structural Variations , 2011, PLoS Comput. Biol..

[37]  Robert Giegerich,et al.  Locomotif: from graphical motif description to RNA motif search , 2007, ISMB/ECCB.

[38]  Eric Delwart,et al.  Structure-based Search Reveals Hammerhead Ribozymes in the Human Microbiome*♦ , 2011, The Journal of Biological Chemistry.

[39]  Nikos Kyrpides,et al.  Genome sequence of Bacillus cereus and comparative analysis with Bacillus anthracis , 2003, Nature.

[40]  Michael Beckstette,et al.  Lightweight comparison of RNAs based on exact sequence–structure matches , 2009, German Conference on Bioinformatics.

[41]  Gonzalo Navarro,et al.  Fast and flexible string matching by combining bit-parallelism and suffix automata , 2000, JEAL.

[42]  J. Szostak,et al.  Isolation of high-affinity GTP aptamers from partially structured RNA libraries , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[43]  Taeho Hwang,et al.  DynaMod: dynamic functional modularity analysis , 2010, Nucleic Acids Res..

[44]  Gad M. Landau,et al.  Local Exact Pattern Matching for Non-Fixed RNA Structures , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[45]  G. Ruxton The unequal variance t-test is an underused alternative to Student's t-test and the Mann–Whitney U test , 2006 .

[46]  A. Bairoch PROSITE: a dictionary of sites and patterns in proteins. , 1991, Nucleic acids research.

[47]  L. Kedes,et al.  Nomenclature for incompletely specified bases in nucleic acid sequences. Recommendations 1984. Nomenclature Committee of the International Union of Biochemistry (NC-IUB). , 1986, Proceedings of the National Academy of Sciences of the United States of America.

[48]  Eric Westhof,et al.  Functional Hammerhead Ribozymes Naturally Encoded in the Genome of Arabidopsis thalianaw⃞ , 2005, The Plant Cell Online.

[49]  Michael Beckstette,et al.  Fast online and index-based algorithms for approximate search of RNA sequence-structure patterns , 2013, BMC Bioinformatics.

[50]  Gad M. Landau,et al.  ExpaRNA-P: simultaneous exact pattern matching and folding of RNAs , 2014, BMC Bioinformatics.

[51]  Sean R. Eddy,et al.  Infernal 1.0: inference of RNA alignments , 2009, Bioinform..

[52]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[53]  Jana Sperschneider,et al.  DotKnot: pseudoknot prediction using the probability dot plot under a refined energy model , 2010, Nucleic acids research.

[54]  Graziano Pesole,et al.  PatSearch: a program for the detection of patterns and structural motifs in nucleotide sequences , 2003, Nucleic Acids Res..

[55]  Rolf Backofen,et al.  Fast detection of common sequence structure patterns in RNAs , 2004, J. Discrete Algorithms.

[56]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.