Faster genome annotation of non-coding RNA families without loss of accuracy

Non-coding RNAs (ncRNAs) are functional RNA molecules that do not code for proteins. Covariance Models (CMs) are a useful statistical tool to find new members of an ncRNA gene family in a large genome database, using both sequence and, importantly, RNA secondary structure information. Unfortunately, CM searches are slow. This paper shows how to make CMs faster while provably sacrificing none of their accuracy. Specifically, based on the CM, our software builds a profile hidden Markov model (HMM), which filters the genome database. This HMM is a gorous filter i.e., its filtering eliminates only sequences that provably could not be annotated as homologs. The CM is run only on what remains. Optimizing the HMM for filtering involves minimizing an exponential objective function with linear inequality constraints. For most known ncRNA families, this allows an 8-gigabase database to be scanned in 2-20 days instead of years, and yields new family members missed by other techniques to improve CM speed.

[1]  C. Burge,et al.  The microRNAs of Caenorhabditis elegans. , 2003, Genes & development.

[2]  R. Durbin,et al.  RNA sequence analysis using covariance models. , 1994, Nucleic acids research.

[3]  R. Durbin,et al.  Biological sequence analysis: Background on probability , 1998 .

[4]  S. Eddy,et al.  tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. , 1997, Nucleic acids research.

[5]  A. Pavesi,et al.  Identification of new eukaryotic tRNA genes in genomic DNA databases by a multistep weight matrix analysis of transcriptional control regions. , 1994, Nucleic acids research.

[6]  R. Overbeek,et al.  Searching for patterns in genomic data. , 1997, Trends in genetics : TIG.

[7]  J. Mattick,et al.  Genome research , 1990, Nature.

[8]  A. Hüttenhofer,et al.  RNomics: identification and function of small, non-messenger RNAs. , 2002, Current opinion in chemical biology.

[9]  Sean R. Eddy,et al.  Rfam: an RNA family database , 2003, Nucleic Acids Res..

[10]  Pedro Miramontes,et al.  Through the GenBank Distribution of Hammerhead and Hammerhead-like RNA Motifs , 2000 .

[11]  Jian L. Zhou,et al.  User's Guide for CFSQP Version 2.0: A C Code for Solving (Large Scale) Constrained Nonlinear (Minimax) Optimization Problems, Generating Iterates Satisfying All Inequality Constraints , 1994 .

[12]  Klas Flärdh,et al.  Antisense RNAs everywhere? , 2002, Trends in genetics : TIG.

[13]  D. Gautheret,et al.  Direct RNA motif definition and identification from multiple sequence alignments using secondary structure profiles. , 2001, Journal of molecular biology.

[14]  Graziano Pesole,et al.  PatSearch: a program for the detection of patterns and structural motifs in nucleotide sequences , 2003, Nucleic Acids Res..

[15]  E. Lai RNA Sensors and Riboswitches: Self-Regulating Messages , 2003, Current Biology.

[16]  Sean R. Eddy,et al.  A memory-efficient dynamic programming algorithm for optimal alignment of a sequence to an RNA secondary structure , 2002, BMC Bioinformatics.

[17]  C. Burks,et al.  Identifying potential tRNA genes in genomic DNA sequences. , 1991, Journal of molecular biology.

[18]  E. Lesnik,et al.  Rev response elements (RRE) in lentiviruses: An RNAMotif algorithm‐based strategy for RRE prediction , 2002, Medicinal research reviews.

[19]  S. Eddy,et al.  A computational screen for methylation guide snoRNAs in yeast. , 1999, Science.

[20]  Vincent Moulton,et al.  A Search for H/ACA SnoRNAs in Yeast Using MFE Secondary Structure Prediction , 2003, Bioinform..

[21]  G. Storz An Expanding Universe of Noncoding RNAs , 2002, Science.

[22]  S. Gottesman,et al.  Stealth regulation: biological circuits with small RNA switches. , 2002, Genes & development.

[23]  D. Ecker,et al.  RNAMotif, an RNA secondary structure definition and search algorithm. , 2001, Nucleic acids research.

[24]  E. Moss,et al.  MicroRNAs: Hidden in the Genome , 2002, Current Biology.

[25]  S. Eddy Computational Genomics of Noncoding RNA Genes , 2002, Cell.

[26]  Maurille J. Fournier,et al.  RNA-guided Nucleotide Modification of Ribosomal and Other RNAs* , 2003, The Journal of Biological Chemistry.