Sequence and Structural Analyses for Functional Non-coding RNAs

Analysis and detection of functional RNAs are currently important topics in both molecular biology and bioinformatics research. Several computational methods based on stochastic context-free grammars (SCFGs) have been developed for modeling and analysing functional RNA sequences. These grammatical methods have succeeded in modeling typical secondary structures of RNAs and are used for structural alignments of RNA sequences. Such stochastic models, however, are not sufficient to discriminate member sequences of an RNA family from non-members, and hence to detect non-coding RNA regions from genome sequences. Recently, the support vector machine (SVM) and kernel function techniques have been actively studied and proposed as a solution to various problems in bioinformatics. SVMs are trained from positive and negative samples and have strong, accurate discrimination abilities, and hence are more appropriate for the discrimination tasks. A few kernel functions that extend the string kernel to measure the similarity of two RNA sequences from the viewpoint of secondary structures have been proposed. In this article, we give an overview of recent progress in SCFG-based methods for RNA sequence analysis and novel kernel functions tailored to measure the similarity of two RNA sequences and developed for use with support vector machines (SVM) in discriminating members of an RNA family from non-members.

[1]  Kiyoshi Asai,et al.  Stem Kernels for RNA Sequence Analyses , 2007, BIRD.

[2]  Michael Zuker,et al.  Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information , 1981, Nucleic Acids Res..

[3]  R. C. Underwood,et al.  Stochastic context-free grammars for tRNA modeling. , 1994, Nucleic acids research.

[4]  Peter F Stadler,et al.  Fast and reliable prediction of noncoding RNAs , 2005, Proc. Natl. Acad. Sci. USA.

[5]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[6]  Ivo L. Hofacker,et al.  Vienna RNA secondary structure server , 2003, Nucleic Acids Res..

[7]  Tom H. Pringle,et al.  The human genome browser at UCSC. , 2002, Genome research.

[8]  Elena Rivas,et al.  The language of RNA: a formal grammar that includes pseudoknots , 2000, Bioinform..

[9]  Ian Holmes,et al.  Stem Stem Stem Stem Loop Loop Loop LoopLoop Loop Loop Loop Loop Loop Loop , 2005 .

[10]  Sean R. Eddy,et al.  Rfam: annotating non-coding RNAs in complete genomes , 2004, Nucleic Acids Res..

[11]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[12]  Kiyoshi Asai,et al.  Marginalized kernels for biological sequences , 2002, ISMB.

[13]  J. McCaskill The equilibrium partition function and base pair binding probabilities for RNA secondary structure , 1990, Biopolymers.

[14]  Russell L. Malmberg,et al.  Stochastic modeling of RNA pseudoknotted structures: a grammatical approach , 2003, ISMB.

[15]  David Haussler,et al.  Identification and Classification of Conserved RNA Secondary Structures in the Human Genome , 2006, PLoS Comput. Biol..

[16]  Yasuo Tabei,et al.  SCARNA: fast and accurate structural alignment of RNA sequences by matching fixed-length stem fragments , 2006, Bioinform..

[17]  R. Durbin,et al.  RNA sequence analysis using covariance models. , 1994, Nucleic acids research.

[18]  Hiroshi Matsui,et al.  Pair stochastic tree adjoining grammars for aligning and predicting pseudoknot RNA structures , 2004, Proceedings. 2004 IEEE Computational Systems Bioinformatics Conference, 2004. CSB 2004..

[19]  Bernhard Schölkopf,et al.  Kernel Methods in Computational Biology , 2005 .

[20]  Yasubumi Sakakibara,et al.  RNA secondary structural alignment with conditional random fields , 2005, ECCB/JBI.

[21]  Gary D. Stormo,et al.  Pairwise local structural alignment of RNA sequences with sequence similarity less than 40% , 2005, Bioinform..

[22]  Serafim Batzoglou,et al.  CONTRAfold: RNA secondary structure prediction without physics-based models , 2006, ISMB.

[23]  Satoshi Kobayashi,et al.  Tree Adjoining Grammars for RNA Structure Prediction , 1999, Theor. Comput. Sci..

[24]  Sean R. Eddy,et al.  RSEARCH: Finding homologs of single structured RNA sequences , 2003, BMC Bioinformatics.

[25]  P. Stadler,et al.  Mapping of conserved RNA secondary structures predicts thousands of functional noncoding RNAs in the human genome , 2005, Nature Biotechnology.

[26]  Bjarne Knudsen,et al.  RNA secondary structure prediction using stochastic context-free grammars and evolutionary history , 1999, Bioinform..

[27]  Daiki Matsuda,et al.  The tRNA-like structure of Turnip yellow mosaic virus RNA is a 3'-translational enhancer. , 2004, Virology.

[28]  Kiyoshi Asai,et al.  Marginalized kernels for RNA sequence data analysis. , 2002, Genome informatics. International Conference on Genome Informatics.

[29]  Peter F. Stadler,et al.  Alignment of RNA base pairing probability matrices , 2004, Bioinform..

[30]  D. Sankoff Simultaneous Solution of the RNA Folding, Alignment and Protosequence Problems , 1985 .

[31]  Elena Rivas,et al.  Noncoding RNA gene detection using comparative sequence analysis , 2001, BMC Bioinformatics.

[32]  Tatsuya Akutsu Recent Advances in RNA Secondary Structure Prediction with Pseudoknots , 2006 .

[33]  Durbin,et al.  Biological Sequence Analysis , 1998 .

[34]  D. Turner,et al.  Dynalign: an algorithm for finding the secondary structure common to two RNA sequences. , 2002, Journal of molecular biology.

[35]  Nello Cristianini,et al.  Classification using String Kernels , 2000 .

[36]  Yasubumi Sakakibara,et al.  Pair hidden Markov models on tree structures , 2003, ISMB.

[37]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[38]  S. Eddy Non–coding RNA genes and the modern RNA world , 2001, Nature Reviews Genetics.