A predictive model for secondary RNA structure using graph theory and a neural network

BackgroundDetermining the secondary structure of RNA from the primary structure is a challenging computational problem. A number of algorithms have been developed to predict the secondary structure from the primary structure. It is agreed that there is still room for improvement in each of these approaches. In this work we build a predictive model for secondary RNA structure using a graph-theoretic tree representation of secondary RNA structure. We model the bonding of two RNA secondary structures to form a larger secondary structure with a graph operation we call merge. We consider all combinatorial possibilities using all possible tree inputs, both those that are RNA-like in structure and those that are not. The resulting data from each tree merge operation is represented by a vector. We use these vectors as input values for a neural network and train the network to recognize a tree as RNA-like or not, based on the merge data vector. The network estimates the probability of a tree being RNA-like.ResultsThe network correctly assigned a high probability of RNA-likeness to trees previously identified as RNA-like and a low probability of RNA-likeness to those classified as not RNA-like. We then used the neural network to predict the RNA-likeness of the unclassified trees.ConclusionsThere are a number of secondary RNA structure prediction algorithms available online. These programs are based on finding the secondary structure with the lowest total free energy. In this work, we create a predictive tool for secondary RNA structures using graph-theoretic values as input for a neural network. The use of a graph operation to theoretically describe the bonding of secondary RNA is novel and is an entirely different approach to the prediction of secondary RNA structures. Our method correctly predicted trees to be RNA-like or not RNA-like for all known cases. In addition, our results convey a measure of likelihood that a tree is RNA-like or not RNA-like. Given that the majority of secondary RNA folding algorithms return more than one possible outcome, our method provides a means of determining the best or most likely structures among all of the possible outcomes.

[1]  Peter F Stadler,et al.  Fast and reliable prediction of noncoding RNAs , 2005, Proc. Natl. Acad. Sci. USA.

[2]  G. Chartrand Introductory Graph Theory , 1984 .

[3]  R. Nussinov,et al.  Tree graphs of RNA secondary structures and their comparisons. , 1989, Computers and biomedical research, an international journal.

[4]  J. Shao Linear Model Selection by Cross-validation , 1993 .

[5]  T. Schlick,et al.  Exploring the repertoire of RNA secondary motifs using graph theory; implications for RNA design. , 2003, Nucleic acids research.

[6]  Mehdi Behzad,et al.  Graphs and Digraphs , 1981, The Mathematical Gazette.

[7]  G Benedetti,et al.  A graph-topological approach to recognition of pattern and similarity in RNA secondary structures. , 1996, Biophysical chemistry.

[8]  T. Schlick,et al.  Candidates for novel RNA topologies. , 2004, Journal of molecular biology.

[9]  Frank Harary,et al.  The number of homeomorphically irreducible trees, and other species , 1959 .

[10]  Michael S. Waterman,et al.  Introduction to computational biology , 1995 .

[11]  H. H. Gan,et al.  RAG: RNA-As-Graphs database-concepts, analysis, features , 2004, Bioinform..

[12]  Fariza Tahi,et al.  Tfold: efficient in silico prediction of non-coding RNA secondary structures , 2010, Nucleic acids research.

[13]  J. Abrahams,et al.  Prediction of RNA secondary structure, including pseudoknotting, by computer simulation. , 1990, Nucleic acids research.

[14]  Zhiyong Wang,et al.  FlexStem: improving predictions of RNA secondary structures with pseudoknots by reducing the search space , 2008, Bioinform..

[15]  Saraswathi Vishveshwara,et al.  PROTEIN STRUCTURE: INSIGHTS FROM GRAPH THEORY , 2002 .

[16]  N. K. Bose,et al.  Neural Network Fundamentals with Graphs, Algorithms and Applications , 1995 .

[17]  M. Huynen,et al.  Automatic detection of conserved RNA structure elements in complete RNA virus genomes. , 1998, Nucleic acids research.

[18]  G. Chartrand Graphs and Digraphs, Fourth Edition , 2004 .

[19]  Zhigang Luo,et al.  [Predicting RNA secondary structures including pseudoknots by covariance with stacking and minimum free energy]. , 2008, Sheng wu gong cheng xue bao = Chinese journal of biotechnology.

[20]  P. Djurić,et al.  Model selection by cross-validation , 1990, IEEE International Symposium on Circuits and Systems.

[21]  Namhee Kim,et al.  RAG: RNA-As-Graphs web resource , 2004, BMC Bioinformatics.

[22]  Peter J. Slater,et al.  Fundamentals of domination in graphs , 1998, Pure and applied mathematics.

[23]  Saumyendra Sengupta,et al.  Graphs and Digraphs , 1994 .

[24]  Teresa W. Haynes,et al.  A quantitative analysis of secondary RNA structure using domination based parameters on trees , 2006, BMC Bioinformatics.

[25]  T. Schlick,et al.  RAG: RNA-As-Graphs database—concepts, analysis, and features , 1987 .

[26]  Lucila Ohno-Machado,et al.  The use of receiver operating characteristic curves in biomedical informatics , 2005, J. Biomed. Informatics.