Title Adjacent nucleotide dependence in ncRNA and order-1 SCFG forncRNA identification

Background: Non-coding RNAs (ncRNAs) are known to be involved in many critical biological processes, and identification of ncRNAs is an important task in biological research. A popular software, Infernal, is the most successful prediction tool and exhibits high sensitivity. The application of Infernal has been mainly focused on small suspected regions. We tried to apply Infernal on a chromosome level; the results have high sensitivity, yet contain many false positives. Further enhancing Infernal for chromosome level or genome wide study is desirable. Methodology: Based on the conjecture that adjacent nucleotide dependence affects the stability of the secondary structure of an ncRNA, we first conduct a systematic study on human ncRNAs and find that adjacent nucleotide dependence in human ncRNA should be useful for identifying ncRNAs. We then incorporate this dependence in the SCFG model and develop a new order-1 SCFG model for identifying ncRNAs. Conclusions: With respect to our experiments on human chromosomes, the proposed new model can eliminate more than 50% false positives reported by Infernal while maintaining the same sensitivity. The executable and the source code of programs are freely available at http://i.cs.hku.hk/,kfwong/order1scfg. Citation: Wong TKF, Lam T-W, Sung W-K, Yiu S-M (2010) Adjacent Nucleotide Dependence in ncRNA and Order-1 SCFG for ncRNA Identification. PLoS ONE 5(9): e12848. doi:10.1371/journal.pone.0012848 Editor: Thomas Mailund, Aarhus University, Denmark Received March 24, 2010; Accepted August 25, 2010; Published September 28, 2010 Copyright: 2010 Wong et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: The authors have no support or funding to report. Competing Interests: The authors have declared that no competing interests exist. * E-mail: kfwong@cs.hku.hk

[1]  Margaret R. Karagas,et al.  Model-based clustering of DNA methylation array data: a recursive-partitioning algorithm for high-dimensional data arising as a mixture of beta distributions , 2008, BMC Bioinformatics.

[2]  Pontus Larsson,et al.  De novo search for non-coding RNA genes in the AT-rich genome of Dictyostelium discoideum: performance of Markov-dependent genome feature scoring. , 2008, Genome research.

[3]  Sam Griffiths-Jones,et al.  Annotating noncoding RNA genes. , 2007, Annual review of genomics and human genetics.

[4]  Sean R. Eddy,et al.  Query-Dependent Banding (QDB) for Faster RNA Similarity Searches , 2007, PLoS Comput. Biol..

[5]  Peter F Stadler,et al.  Fast and reliable prediction of noncoding RNAs , 2005, Proc. Natl. Acad. Sci. USA.

[6]  Yi Zhao,et al.  NONCODE: an integrated knowledge database of non-coding RNAs , 2004, Nucleic Acids Res..

[7]  Sean R. Eddy,et al.  A memory-efficient dynamic programming algorithm for optimal alignment of a sequence to an RNA secondary structure , 2002, BMC Bioinformatics.

[8]  Tamás Kiss,et al.  7SK small nuclear RNA binds to and inhibits the activity of CDK9/cyclin T complexes , 2001, Nature.

[9]  Qiang Zhou,et al.  The 7SK small nuclear RNA inhibits the CDK9/cyclin T1 kinase to control transcription , 2001, Nature.

[10]  D. Turner,et al.  Thermodynamic parameters for an expanded nearest-neighbor model for formation of RNA duplexes with Watson-Crick base pairs. , 1998, Biochemistry.

[11]  S. Eddy,et al.  tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. , 1997, Nucleic acids research.

[12]  A. E. Walter,et al.  Thermodynamics of coaxially stacked helixes with GA and CC mismatches. , 1996, Biochemistry.

[13]  A. E. Walter,et al.  Sequence dependence of stability for coaxial stacking of RNA helixes with Watson-Crick base paired interfaces. , 1994, Biochemistry.

[14]  R. Durbin,et al.  RNA sequence analysis using covariance models. , 1994, Nucleic acids research.

[15]  Ivo L Hofacker,et al.  RNAs everywhere: genome-wide annotation of structured RNAs. , 2006, Genome informatics. International Conference on Genome Informatics.

[16]  Sean R. Eddy,et al.  Rfam: an RNA family database , 2003, Nucleic Acids Res..

[17]  N. Pace,et al.  Ribonuclease P: unity and diversity in a tRNA processing ribozyme. , 1998, Annual review of biochemistry.