Searching RNA motifs and their intermolecular contacts with constraint networks

MOTIVATION Searching RNA gene occurrences in genomic sequences is a task whose importance has been renewed by the recent discovery of numerous functional RNA, often interacting with other ligands. Even if several programs exist for RNA motif search, none exists that can represent and solve the problem of searching for occurrences of RNA motifs in interaction with other molecules. RESULTS We present a constraint network formulation of this problem. RNA are represented as structured motifs that can occur on more than one sequence and which are related together by possible hybridization. The implemented tool MilPat is used to search for several sRNA families in genomic sequences. Results show that MilPat allows to efficiently search for interacting motifs in large genomic sequences and offers a simple and extensible framework to solve such problems. New and known sRNA are identified as H/ACA candidates in Methanocaldococcus jannaschii. AVAILABILITY http://carlit.toulouse.inra.fr/MilPaT/MilPat.pl.

[1]  S. Eddy,et al.  Noncoding RNA genes identified in AT-rich hyperthermophiles , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[2]  David R. Gilbert,et al.  A Constraint Based Structure Description Language for Biosequences , 2001, Constraints.

[3]  Thomas Schiex,et al.  Solving weighted CSP by maintaining arc consistency , 2004, Artif. Intell..

[4]  Wayne A. Decatur,et al.  Genome-wide searching for pseudouridylation guide snoRNAs: analysis of the Saccharomyces cerevisiae genome. , 2004, Nucleic acids research.

[5]  Pedro Barahona,et al.  PSICO: Solving Protein Structures with Constraint Programming and Optimization , 2002, Constraints.

[6]  Daniel L Baker,et al.  RNA-guided RNA modification: functional organization of the archaeal H/ACA RNP. , 2005, Genes & development.

[7]  J. Bachellerie,et al.  Archaeal homologs of eukaryotic methylation guide small nucleolar RNAs: lessons from the Pyrococcus genomes. , 2000, Journal of molecular biology.

[8]  C. Branlant,et al.  Reconstitution of archaeal H/ACA small ribonucleoprotein complexes active in pseudouridylation , 2005, Nucleic acids research.

[9]  J. Steitz,et al.  The expanding universe of noncoding RNAs. , 2006, Cold Spring Harbor symposia on quantitative biology.

[10]  Nicola Vitacolonna,et al.  Structured motifs search , 2004, J. Comput. Biol..

[11]  G Lapalme,et al.  The combination of symbolic and numerical computation for three-dimensional modeling of RNA. , 1991, Science.

[12]  E Westhof,et al.  An interactive framework for RNA secondary structure prediction with a dynamical treatment of constraints. , 1995, Journal of molecular biology.

[13]  D. Ecker,et al.  RNAMotif, an RNA secondary structure definition and search algorithm. , 2001, Nucleic acids research.

[14]  Pascal Van Hentenryck,et al.  A Generic Arc-Consistency Algorithm and its Specializations , 1992, Artif. Intell..

[15]  Nicola Vitacolonna,et al.  Structured motifs search. , 2005, Journal of computational biology : a journal of computational molecular cell biology.

[16]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[17]  C. Gaspin,et al.  Automatic display of RNA secondary structures , 1993, Comput. Appl. Biosci..

[18]  Esko Ukkonen,et al.  Constructing Suffix Trees On-Line in Linear Time , 1992, IFIP Congress.

[19]  Gaston H. Gonnet,et al.  A new approach to text searching , 1989, SIGIR '89.

[20]  Toby Walsh,et al.  Handbook of Constraint Programming , 2006, Handbook of Constraint Programming.

[21]  David Haussler,et al.  Recent Methods for RNA Modeling Using Stochastic Context-Free Grammars , 1994, CPM.

[22]  A. Viari,et al.  Palingol: a declarative programming language to describe nucleic acids' secondary structures and to scan sequence database. , 1996, Nucleic acids research.

[23]  E Westhof,et al.  Derivation of the three-dimensional architecture of bacterial ribonuclease P RNAs from comparative sequence analysis. , 1998, Journal of molecular biology.

[24]  P. Schattner Searching for RNA genes using base-composition statistics. , 2002, Nucleic acids research.

[25]  A. Hüttenhofer,et al.  Binding of L7Ae protein to the K-turn of archaeal snoRNAs: a shared RNA binding motif for C/D and H/ACA box snoRNAs in Archaea. , 2003, Nucleic acids research.

[26]  Julien Allali,et al.  The at most k-deep factor tree , 2003 .

[27]  Russ B. Altman,et al.  Constraint Satisfaction Techniques for Modeling Large Complexes: Application to the Central Domain of 16S Ribosomal RNA , 1994, ISMB.

[28]  R. Durbin,et al.  RNA sequence analysis using covariance models. , 1994, Nucleic acids research.

[29]  R. Overbeek,et al.  Searching for patterns in genomic data. , 1997, Trends in genetics : TIG.

[30]  Daniel Gautheret,et al.  Pattern searching/alignment with RNA primary and secondary structures: an effective descriptor for tRNA , 1990, Comput. Appl. Biosci..

[31]  Gaston H. Gonnet,et al.  A new approach to text searching , 1992, CACM.

[32]  Guy Riddihough The Other RNA World , 2002, Science.

[33]  Stéphane Vialette,et al.  On the computational complexity of 2-interval pattern matching problems , 2004, Theor. Comput. Sci..

[34]  Udi Manber,et al.  Fast Text Searching With Errors , 2005 .

[35]  Mark Craven,et al.  Refining the Structure of a Stochastic Context-Free Grammar , 2001, IJCAI.