Genome-Wide Prediction of Coaxial Helical Stacking Using Random Forests and Covariance Models

Developing effective artificial intelligence tools to find motifs in DNA, RNA and proteins poses a challenging yet important problem in life science research. In this paper, we present a computational approach for finding RNA tertiary motifs in genomic sequences. Specifically, we predict genomic coordinate locations for coaxial helical stackings in 3-way RNA junctions. These predictions are provided by our tertiary motif search package, named CSminer, which utilizes two versatile methodologies: random forests and covariance models. A coaxial helical stacking tertiary motif occurs in a 3-way RNA junction where two separate helical elements form a pseudocontiguous helix and provide thermodynamic stability to the RNA molecule as a whole. Our CSminer tool first uses a genome-wide search method based on covariance models to find a genomic region that may potentially contain a coaxial helical stacking tertiary motif. CSminer then uses a random forests classifier to predict whether the genomic region indeed contains the tertiary motif. Experimental results demonstrate the effectiveness of our approach.

[1]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[2]  T. Schlick,et al.  Predicting coaxial helical stacking in RNA junctions , 2011, Nucleic acids research.