UNEARTHING THE BURIED TRESASURES-COMPUTATIONAL IDENTIFICATION AND ANALYSIS OF NONCODING RNAS

The central dogma of molecular biology states that the genetic information flows from DNA to RNA to protein. This dogma has exerted a substantial influence on our understanding of the genetic activities in the cells. Under this influence, the prevailing assumption until the recent past was that genes are basically repositories for protein coding information, and proteins are responsible for most of the important biological functions in all cells. In the meanwhile, the importance of RNAs has remained rather obscure, and the RNA was mainly viewed as a passive intermediary that bridges the gap between DNA and protein. Except for classic examples such as tRNAs (transfer RNAs) and rRNAs (ribosomal RNAs), functional noncoding RNAs were considered to be rare. However, this view has experienced a dramatic change during the last decade, as systematic screening of various genomes identified myriads of noncoding RNAs (ncRNAs), which are RNA molecules that function without being translated into proteins [11, 40]. It has been realized that many ncRNAs play important roles in various biological processes. As RNAs can interact with other RNAs and DNAs in a sequence-specific manner, they are especially useful in tasks that require highly specific nucleotide recognition [11]. Good examples are the miRNAs (microRNAs) that regulate gene expression by targeting mRNAs (messenger RNAs) [4, 20] , and the siRNAs (small interfering RNAs) that take part in the RNAi (RNA interference) pathways for gene silencing [29, 30]. Recent developments show that ncRNAs are extensively involved in many gene regulatory mechanisms [14, 17]. The roles of ncRNAs known to this day are truly diverse. These include transcription and translation control, chromosome replication, RNA processing and modification, and protein degradation and translocation [40], just to name a few. These days, it is even claimed that ncRNAs dominate the genomic output of the higher organisms such as mammals, and it is being suggested that the greater portion of their genome (which does not encode proteins) is dedicated to the control and regulation of cell development [27]. As more and more evidences pile up, greater attention is paid to ncRNAs, which have been neglected for a long time. Researchers began to realize that the vast majority of the genome that was regarded as “junk”, mainly because it was not well understood, may indeed hold ∗Both authors are with the Department of Electrical Engineering, California Institute of Technology, Pasadena, CA 91125, USA. †Work supported in parts by the NSF grant CCF-0428326 and the Microsoft Research Graduate Fellowship.

[1]  Elena Rivas,et al.  Secondary structure alone is generally not statistically significant for the detection of noncoding RNAs , 2000, Bioinform..

[2]  Steve Young,et al.  Applications of stochastic context-free grammars using the Inside-Outside algorithm , 1990 .

[3]  Peter F Stadler,et al.  Fast and reliable prediction of noncoding RNAs , 2005, Proc. Natl. Acad. Sci. USA.

[4]  P. P. Vaidyanathan,et al.  The role of signal-processing concepts in genomics and proteomics , 2004, J. Frankl. Inst..

[5]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Maciej Szymanski,et al.  The non-coding RNAs as riboregulators , 2001, Nucleic Acids Res..

[7]  S. Tiwari,et al.  Prediction of probable genes by Fourier analysis of genomic sequences , 1997, Comput. Appl. Biosci..

[8]  Byung-Jun Yoon,et al.  An overview of the role of context-sensitive HMMS in the prediction of NCRNA genes , 2005, IEEE/SP 13th Workshop on Statistical Signal Processing, 2005.

[9]  Michael T. McManus,et al.  Gene silencing in mammals by small interfering RNAs , 2002, Nature Reviews Genetics.

[10]  D. Bartel MicroRNAs Genomics, Biogenesis, Mechanism, and Function , 2004, Cell.

[11]  E. Dam,et al.  Structural and functional aspects of RNA pseudoknots. , 1992, Biochemistry.

[12]  R. Durbin,et al.  RNA sequence analysis using covariance models. , 1994, Nucleic acids research.

[13]  R. Breaker,et al.  Gene regulation by riboswitches , 2004, Nature Reviews Molecular Cell Biology.

[14]  Elena Rivas,et al.  The language of RNA: a formal grammar that includes pseudoknots , 2000, Bioinform..

[15]  James A. Birchler,et al.  RNAi-mediated pathways in the nucleus , 2005, Nature Reviews Genetics.

[16]  B. Berger,et al.  MSARI: multiple sequence alignments for statistical detection of RNA secondary structure. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[17]  P. Vaidyanathan Genomics and proteomics: a signal processor's tour , 2004, IEEE Circuits and Systems Magazine.

[18]  S. Gottesman,et al.  Stealth regulation: biological circuits with small RNA switches. , 2002, Genes & development.

[19]  J. Mattick Challenging the dogma: the hidden layer of non-protein-coding RNAs in complex organisms. , 2003, BioEssays : news and reviews in molecular, cellular and developmental biology.

[20]  Michael A. Harrison,et al.  Introduction to formal language theory , 1978 .

[21]  Hiroshi Matsui,et al.  Pair stochastic tree adjoining grammars for aligning and predicting pseudoknot RNA structures , 2004, Proceedings. 2004 IEEE Computational Systems Bioinformatics Conference, 2004. CSB 2004..

[22]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[23]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[24]  Byung-Jun Yoon,et al.  Context-Sensitive Hidden Markov Models for Modeling Long-Range Dependencies in Symbol Sequences , 2006, IEEE Transactions on Signal Processing.

[25]  S. Eddy Non–coding RNA genes and the modern RNA world , 2001, Nature Reviews Genetics.

[26]  R. Durbin,et al.  Biological sequence analysis: Background on probability , 1998 .

[27]  Elena Rivas,et al.  Noncoding RNA gene detection using comparative sequence analysis , 2001, BMC Bioinformatics.

[28]  D. Haussler,et al.  A hidden Markov model that finds genes in E. coli DNA. , 1994, Nucleic acids research.

[29]  Dimitris Anastassiou,et al.  Genomic signal processing , 2001, IEEE Signal Process. Mag..

[30]  G. Storz An Expanding Universe of Noncoding RNAs , 2002, Science.

[31]  Sean R. Eddy,et al.  Rfam: an RNA family database , 2003, Nucleic Acids Res..

[32]  Peter F. Stadler,et al.  Prediction of consensus RNA secondary structures including pseudoknots , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[33]  S. Eddy,et al.  tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. , 1997, Nucleic acids research.

[34]  Gary Ruvkun,et al.  Glimpses of a Tiny RNA World , 2001, Science.

[35]  Jorma Rissanen,et al.  Partially hidden Markov models , 1996, IEEE Trans. Inf. Theory.

[36]  Sean R. Eddy,et al.  RSEARCH: Finding homologs of single structured RNA sequences , 2003, BMC Bioinformatics.

[37]  P. Stadler,et al.  Mapping of conserved RNA secondary structures predicts thousands of functional noncoding RNAs in the human genome , 2005, Nature Biotechnology.

[38]  S. Eddy Computational Genomics of Noncoding RNA Genes , 2002, Cell.

[39]  Lin He,et al.  MicroRNAs: small RNAs with a big role in gene regulation , 2004, Nature reviews genetics.

[40]  V. Moulton Tracking down noncoding RNAs. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[41]  Sean R Eddy,et al.  How do RNA folding algorithms work? , 2004, Nature Biotechnology.

[42]  Diego di Bernardo,et al.  ddbRNA: detection of conserved secondary structures in multiple alignments , 2003, Bioinform..

[43]  Jerrold R. Griggs,et al.  Algorithms for Loop Matchings , 1978 .

[44]  Noam Chomsky,et al.  On Certain Formal Properties of Grammars , 1959, Inf. Control..

[45]  Jeffrey E. Barrick,et al.  Metabolite-binding RNA domains are present in the genes of eukaryotes. , 2003, RNA.

[46]  A. Bairoch,et al.  PROSITE: recent developments. , 1994, Nucleic acids research.

[47]  Weixiong Zhang,et al.  An Iterated loop matching approach to the prediction of RNA secondary structures with pseudoknots , 2004, Bioinform..

[48]  P. P. Vaidyanathan,et al.  Profile Context-Sensitive HMMs for Probabilistic Modeling of Sequences With Complex Correlations , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[49]  Michael Zuker,et al.  Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information , 1981, Nucleic Acids Res..

[50]  R. Breaker,et al.  Riboswitches as versatile gene control elements. , 2005, Current opinion in structural biology.

[51]  D. Haussler,et al.  Hidden Markov models in computational biology. Applications to protein modeling. , 1993, Journal of molecular biology.

[52]  R. C. Underwood,et al.  Stochastic context-free grammars for tRNA modeling. , 1994, Nucleic acids research.

[53]  Byung-Jun Yoon,et al.  HMM with auxiliary memory: a new tool for modeling RNA structures , 2004, Conference Record of the Thirty-Eighth Asilomar Conference on Signals, Systems and Computers, 2004..