Evaluating Distance Measures for RNA Motif Search

This paper extends an earlier study which outlined a bioinformatic pipeline for exploratory search for RNA motifs incorporating both primary and secondary structure. The pipeline is applied to three data sets, one of which is a larger version of that used in the earlier study. Instead of a single method of estimating the distance between RNA folds four distance measures were tested. The data sets are: a set of random control sequences, a set of synthetic sequences with simple designed folds, and the iron response element data set for which actual biological RNA folds are available. The pipeline demonstrates the ability to produce clusters that contain known motifs in the biological data and those designed into the synthetic data. The results for the distance measures varies substantially and one of the measures, difference in energy, is found to be too simplistic to be useful for differentiating motifs. The other three distance measures all demonstrate some degree of merit. At the heart of the pipeline is a non-linear projection algorithm that uses evolutionary computation to display the intra-RNA-fold distances so that the various distance measures can be visually compared. While the performance of this algorithm is acceptable, suggestions for improving it are made.

[1]  Yuh-Jyh Hu,et al.  GPRM: a genetic programming approach to finding common RNA secondary structure elements , 2003, Nucleic Acids Res..

[2]  Yuh-Jyh Hu Prediction of consensus structural motifs in a family of coregulated RNA sequences. , 2002, Nucleic acids research.

[3]  L. Kühn,et al.  Optimal sequence and structure of iron-responsive elements. Selection of RNA stem-loops with high affinity for iron regulatory factor. , 1994, The Journal of biological chemistry.

[4]  Bruce A. Shapiro,et al.  An algorithm for comparing multiple RNA secondary structures , 1988, Comput. Appl. Biosci..

[5]  Laurie J. Heyer,et al.  Finding the most significant common sequence and structure motifs in a set of RNA sequences. , 1997, Nucleic acids research.

[6]  Luc Jaeger,et al.  RNA pseudoknots , 1992, Current Biology.

[7]  Byoung-Tak Zhang,et al.  Two-Step Genetic Programming for Optimization of RNA Common-Structure , 2004, EvoWorkshops.

[8]  Daniel Ashlock,et al.  A modular data analysis pipeline for the discovery of novel rna motifs , 2006 .

[9]  Daniel A. Ashlock,et al.  Nonlinear projection for the display of high dimensional distance data , 2005, 2005 IEEE Congress on Evolutionary Computation.

[10]  Kaizhong Zhang,et al.  Comparing multiple RNA secondary structures using tree comparisons , 1990, Comput. Appl. Biosci..

[11]  D. Turner,et al.  Dynalign: an algorithm for finding the secondary structure common to two RNA sequences. , 2002, Journal of molecular biology.

[12]  Giancarlo Mauri,et al.  Pattern Discovery in RNA Secondary Structure Using Affix Trees , 2003, CPM.

[13]  Michael Zuker,et al.  Algorithms and Thermodynamics for RNA Secondary Structure Prediction: A Practical Guide , 1999 .

[14]  João Meidanis,et al.  Introduction to computational molecular biology , 1997 .

[15]  V. W. Porto,et al.  Discovery of RNA structural elements using evolutionary computation. , 2002, Nucleic acids research.

[16]  Daniel A. Ashlock,et al.  Depth Annotation of RNA Folds for Secondary Structure Motif Search , 2005, 2005 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology.

[17]  G. Stormo,et al.  Discovering common stem-loop motifs in unaligned RNA sequences. , 2001, Nucleic acids research.

[18]  Giancarlo Mauri,et al.  Algorithms for pattern matching and discovery in RNA secondary structure , 2005, Theor. Comput. Sci..

[19]  A New Method for Evaluating the Structural Similarity of Proteins Using Geometric Morphometrics , 2000 .

[20]  D. Ecker,et al.  RNAMotif, an RNA secondary structure definition and search algorithm. , 2001, Nucleic acids research.

[21]  M. Hentze,et al.  Molecular control of vertebrate iron metabolism: mRNA-based regulatory circuits operated by iron, nitric oxide, and oxidative stress. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[22]  Gilbert Syswerda,et al.  A Study of Reproduction in Generational and Steady State Genetic Algorithms , 1990, FOGA.

[23]  Mike A. Steel,et al.  Metrics on RNA Secondary Structures , 2000, J. Comput. Biol..