Riboswitch Detection Using Profile Hidden Markov Models

BackgroundRiboswitches are a type of noncoding RNA that regulate gene expression by switching from one structural conformation to another on ligand binding. The various classes of riboswitches discovered so far are differentiated by the ligand, which on binding induces a conformational switch. Every class of riboswitch is characterized by an aptamer domain, which provides the site for ligand binding, and an expression platform that undergoes conformational change on ligand binding. The sequence and structure of the aptamer domain is highly conserved in riboswitches belonging to the same class. We propose a method for fast and accurate identification of riboswitches using profile Hidden Markov Models (pHMM). Our method exploits the high degree of sequence conservation that characterizes the aptamer domain.ResultsOur method can detect riboswitches in genomic databases rapidly and accurately. Its sensitivity is comparable to the method based on the Covariance Model (CM). For six out of ten riboswitch classes, our method detects more than 99.5% of the candidates identified by the much slower CM method while being several hundred times faster. For three riboswitch classes, our method detects 97-99% of the candidates relative to the CM method. Our method works very well for those classes of riboswitches that are characterized by distinct and conserved sequence motifs.ConclusionRiboswitches play a crucial role in controlling the expression of several prokaryotic genes involved in metabolism and transport processes. As more and more new classes of riboswitches are being discovered, it is important to understand the patterns of their intra and inter genomic distribution. Understanding such patterns will enable us to better understand the evolutionary history of these genetic regulatory elements. However, a complete picture of the distribution pattern of riboswitches will emerge only after accurate identification of riboswitches across genomes. We believe that the riboswitch detection method developed in this paper will aid in that process. The significant advantage in terms of speed, of our pHMM-based approach over the method based on CM allows us to scan entire databases (rather than 5'UTRs only) in a relatively short period of time in order to accurately identify riboswitch candidates.

[1]  R. Durbin,et al.  RNA sequence analysis using covariance models. , 1994, Nucleic acids research.

[2]  M. Gelfand,et al.  Regulation of lysine biosynthesis and transport genes in bacteria: yet another RNA riboswitch? , 2003, Nucleic acids research.

[3]  J. Piccirilli,et al.  'Turning on' riboswitches to their antibacterial potential. , 2007, Nature chemical biology.

[4]  Petr Svoboda,et al.  miRNA, siRNA, piRNA: Knowns of the unknown , 2008, RNA biology.

[5]  Sean R. Eddy,et al.  Infernal 1.0: inference of RNA alignments , 2009, Bioinform..

[6]  Enrique Merino,et al.  RibEx: a web server for locating riboswitches and other conserved bacterial regulatory elements , 2005, Nucleic Acids Res..

[7]  Richard Hughey,et al.  Hidden Markov models for detecting remote protein homologies , 1998, Bioinform..

[8]  Sean R. Eddy,et al.  Maximum Discrimination Hidden Markov Models of Sequence Consensus , 1995, J. Comput. Biol..

[9]  Sean R. Eddy,et al.  Profile hidden Markov models , 1998, Bioinform..

[10]  R. Breaker,et al.  Thiamine pyrophosphate riboswitches are targets for the antimicrobial compound pyrithiamine. , 2005, Chemistry & biology.

[11]  Thomas Dandekar,et al.  Riboswitch finder tool for identification of riboswitch RNAs , 2004, Nucleic Acids Res..

[12]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[13]  R. Breaker,et al.  Antibacterial lysine analogs that target lysine riboswitches. , 2007, Nature chemical biology.

[14]  S. Eddy Non–coding RNA genes and the modern RNA world , 2001, Nature Reviews Genetics.

[15]  Tatiana Tatusova,et al.  NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins , 2004, Nucleic Acids Res..

[16]  Charles Elkan,et al.  Fitting a Mixture Model By Expectation Maximization To Discover Motifs In Biopolymer , 1994, ISMB.

[17]  Markus Wieland,et al.  Artificial Riboswitches: Synthetic mRNA‐Based Regulators of Gene Expression , 2008, Chembiochem : a European journal of chemical biology.

[18]  Evgeny Nudler,et al.  Analysis of the intrinsic transcription termination mechanism and its control. , 2003, Methods in enzymology.

[19]  Zasha Weinberg,et al.  Sequence-based heuristics for faster annotation of non-coding RNA families , 2006, Bioinform..

[20]  R. Kadner,et al.  Adenosylcobalamin inhibits ribosome binding to btuB RNA. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[21]  Sean R. Eddy,et al.  Infernal 1.0: inference of RNA alignments , 2009, Bioinform..

[22]  R. Breaker,et al.  Regulation of bacterial gene expression by riboswitches. , 2005, Annual review of microbiology.

[23]  D. Haussler,et al.  Hidden Markov models in computational biology. Applications to protein modeling. , 1993, Journal of molecular biology.

[24]  Erik L. L. Sonnhammer,et al.  Improved profile HMM performance by assessment of critical algorithmic features in SAM and HMMER , 2005, BMC Bioinformatics.

[25]  Stavros J. Hamodrakas,et al.  A Hidden Markov Model method, capable of predicting and discriminating β-barrel outer membrane proteins , 2004, BMC Bioinformatics.

[26]  Ricardo Ciria,et al.  Conserved regulatory motifs in bacteria: riboswitches and beyond. , 2004, Trends in genetics : TIG.

[27]  D. Penny,et al.  The Path from the RNA World , 1998, Journal of Molecular Evolution.

[28]  Sean R. Eddy,et al.  Rfam: annotating non-coding RNAs in complete genomes , 2004, Nucleic Acids Res..

[29]  Zasha Weinberg,et al.  Exploiting conserved structure for faster annotation of non-coding RNAs without loss of accuracy , 2004, ISMB/ECCB.

[30]  Mamoon Rashid,et al.  Support Vector Machine-based method for predicting subcellular localization of mycobacterial proteins using evolutionary information and motifs , 2007, BMC Bioinformatics.

[31]  Zasha Weinberg,et al.  Faster genome annotation of non-coding RNA families without loss of accuracy , 2004, RECOMB.

[32]  Pierre Baldi,et al.  Assessing the accuracy of prediction algorithms for classification: an overview , 2000, Bioinform..

[33]  J. Szostak,et al.  In vitro selection of RNA molecules that bind specific ligands , 1990, Nature.

[34]  R. Breaker,et al.  Adenine riboswitches and gene activation by disruption of a transcription terminator , 2004, Nature Structural &Molecular Biology.

[35]  Anders Krogh,et al.  Hidden Markov models for sequence analysis: extension and analysis of the basic method , 1996, Comput. Appl. Biosci..

[36]  Michael Gribskov,et al.  Combining evidence using p-values: application to sequence homology searches , 1998, Bioinform..

[37]  David Penny,et al.  Relics from the RNA World , 1998, Journal of Molecular Evolution.

[38]  R. Breaker,et al.  Gene regulation by riboswitches , 2004, Nature Reviews Molecular Cell Biology.

[39]  R. Breaker,et al.  Riboswitches as versatile gene control elements. , 2005, Current opinion in structural biology.

[40]  Sean R. Eddy,et al.  Query-Dependent Banding (QDB) for Faster RNA Similarity Searches , 2007, PLoS Comput. Biol..

[41]  D. Moazed Small RNAs in transcriptional gene silencing and genome defence , 2009, Nature.

[42]  R. Breaker,et al.  An mRNA structure that controls gene expression by binding FMN , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[43]  V. Moulton Tracking down noncoding RNAs. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[44]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.