Prediction of novel precursor miRNAs using a context-sensitive hidden Markov model (CSHMM)

BackgroundIt has been apparent in the last few years that small non coding RNAs (ncRNA) play a very significant role in biological regulation. Among these microRNAs (miRNAs), 22-23 nucleotide small regulatory RNAs, have been a major object of study as these have been found to be involved in some basic biological processes. So far about 706 miRNAs have been identified in humans alone. However, it is expected that there may be many more miRNAs encoded in the human genome. In this report, a "context-sensitive" Hidden Markov Model (CSHMM) to represent miRNA structures has been proposed and tested extensively. We also demonstrate how this model can be used in conjunction with filters as an ab initio method for miRNA identification.ResultsThe probabilities of the CSHMM model were estimated using known human miRNA sequences. A classifier for miRNAs based on the likelihood score of this "trained" CSHMM was evaluated by: (a) cross-validation estimates using known human sequences, (b) predictions on a dataset of known miRNAs, and (c) prediction on a dataset of non coding RNAs. The CSHMM is compared with two recently developed methods, miPred and CID-miRNA. The results suggest that the CSHMM performs better than these methods. In addition, the CSHMM was used in a pipeline that includes filters that check for the presence of EST matches and the presence of Drosha cutting sites. This pipeline was used to scan and identify potential miRNAs from the human chromosome 19. It was also used to identify novel miRNAs from small RNA sequences of human normal leukocytes obtained by the Deep sequencing (Solexa) methodology. A total of 49 and 308 novel miRNAs were predicted from chromosome 19 and from the small RNA sequences respectively.ConclusionThe results suggest that the CSHMM is likely to be a useful tool for miRNA discovery either for analysis of individual sequences or for genome scan. Our pipeline, consisting of a CSHMM and filters to reduce false positives shows promise as an approach for ab initio identification of novel miRNAs.

[1]  Carsten Wiuf,et al.  Ab Initio Identification of Human Micrornas Based on Structure Motifs Ab Initio Identification of Human Micrornas Based on Struc- Ture Motifs , 2007 .

[2]  Byoung-Tak Zhang,et al.  Human microRNA prediction through a probabilistic co-learning model of sequence and structure , 2005, Nucleic acids research.

[3]  Peter F. Stadler,et al.  Hairpins in a Haystack: recognizing microRNA precursors in comparative genomics data , 2006, ISMB.

[4]  A. T. Freitas,et al.  Current tools for the identification of miRNA genes and their targets , 2009, Nucleic acids research.

[5]  C. Burge,et al.  The microRNAs of Caenorhabditis elegans. , 2003, Genes & development.

[6]  Byung-Jun Yoon,et al.  RNA secondary structure prediction using context-sensitive hidden Markov models , 2004 .

[7]  A. Srinivasan,et al.  CID-miRNA: a web server for prediction of novel miRNA precursors in human genome. , 2008, Biochemical and biophysical research communications.

[8]  T. Fearn,et al.  Classification and Regression Trees (CART) , 2020, Statistical Learning from a Regression Perspective.

[9]  N. Rajewsky,et al.  Discovering microRNAs from deep sequencing data using miRDeep , 2008, Nature Biotechnology.

[10]  P. P. Vaidyanathan,et al.  Optimal alignment algorithm for context-sensitive hidden Markov models , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[11]  Roni Rosenfeld,et al.  Learning Hidden Markov Model Structure for Information Extraction , 1999 .

[12]  Bin Fan,et al.  MiRFinder: an improved approach and software implementation for genome-wide fast microRNA precursor scans , 2007, BMC Bioinformatics.

[13]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[14]  Santosh K. Mishra,et al.  De novo SVM classification of precursor microRNAs from genomic pseudo hairpins using global and intrinsic folding measures , 2007, Bioinform..

[15]  Daniel Gautheret,et al.  Profile-based detection of microRNA precursors in animal genomes , 2005, Bioinform..

[16]  Fei Li,et al.  MicroRNA identification based on sequence and structure alignment , 2005, Bioinform..

[17]  Richard A. Berk Classification and Regression Trees (CART) , 2008 .

[18]  Ryan D. Morin,et al.  Application of massively parallel sequencing to microRNA profiling and discovery in human embryonic stem cells. , 2008, Genome research.

[19]  Fei Li,et al.  Classification of real and pseudo microRNA precursors using local structure-sequence features and support vector machine , 2005, BMC Bioinformatics.

[20]  Peng Jiang,et al.  MiPred: classification of real and pseudo microRNA precursors using random forest prediction model with combined features , 2007, Nucleic Acids Res..