ProbeAlign: incorporating high-throughput sequencing-based structure probing information into ncRNA homology search

BackgroundRecent advances in RNA structure probing technologies, including the ones based on high-throughput sequencing, have improved the accuracy of thermodynamic folding with quantitative nucleotide-resolution structural information.ResultsIn this paper, we present a novel approach, ProbeAlign, to incorporate the reactivities from high-throughput RNA structure probing into ncRNA homology search for functional annotation. To reduce the overhead of structure alignment on large-scale data, the specific pairing patterns in the query sequences are ignored. On the other hand, the partial structural information of the target sequences embedded in probing data is retrieved to guide the alignment. Thus the structure alignment problem is transformed into a sequence alignment problem with additional reactivity information. The benchmark results show that the prediction accuracy of ProbeAlign outperforms filter-based CMsearch with high computational efficiency. The application of ProbeAlign to the FragSeq data, which is based on genome-wide structure probing, has demonstrated its capability to search ncRNAs in a large-scale dataset from high-throughput sequencing.ConclusionsBy incorporating high-throughput sequencing-based structure probing information, ProbeAlign can improve the accuracy and efficiency of ncRNA homology search. It is a promising tool for ncRNA functional annotation on genome-wide datasets.AvailabilityThe source code of ProbeAlign is available at http://genome.ucf.edu/ProbeAlign.

[1]  J. Jurka,et al.  Repbase Update, a database of eukaryotic repetitive elements , 2005, Cytogenetic and Genome Research.

[2]  R. Breaker,et al.  Riboswitches as versatile gene control elements. , 2005, Current opinion in structural biology.

[3]  ENCODEConsortium,et al.  An Integrated Encyclopedia of DNA Elements in the Human Genome , 2012, Nature.

[4]  J. Mattick,et al.  Long non-coding RNAs: insights into functions , 2009, Nature Reviews Genetics.

[5]  Peter F. Stadler,et al.  Non-coding RNA annotation of the genome of Trichoplax adhaerens , 2009, Nucleic acids research.

[6]  I. Hofacker RNA consensus structure prediction with RNAalifold. , 2007, Methods in molecular biology.

[7]  Michael Zuker,et al.  Mfold web server for nucleic acid folding and hybridization prediction , 2003, Nucleic Acids Res..

[8]  Raymond K. Auerbach,et al.  An Integrated Encyclopedia of DNA Elements in the Human Genome , 2012, Nature.

[9]  K. Weeks,et al.  Selective 2′-hydroxyl acylation analyzed by primer extension (SHAPE): quantitative RNA structure analysis at single nucleotide resolution , 2006, Nature Protocols.

[10]  Peter F Stadler,et al.  Fast and reliable prediction of noncoding RNAs , 2005, Proc. Natl. Acad. Sci. USA.

[11]  Y. Zhang,et al.  In vivo genome-wide profiling of RNA secondary structure reveals novel regulatory features , 2013, Nature.

[12]  J. Steitz,et al.  The expanding universe of noncoding RNAs. , 2006, Cold Spring Harbor symposia on quantitative biology.

[13]  Shenghua Huang,et al.  Structural insights into SRP RNA: an induced fit mechanism for SRP assembly. , 2005, RNA.

[14]  Peter F. Stadler,et al.  RNAz 2.0: Improved Noncoding RNA Detection , 2010, Pacific Symposium on Biocomputing.

[15]  Peter Clote,et al.  Integrating Chemical Footprinting Data into RNA Secondary Structure Prediction , 2012, PloS one.

[16]  Yann Ponty,et al.  GenRGenS: software for generating random genomic sequences and structures , 2006, Bioinform..

[17]  P. Ryvkin,et al.  Genome-Wide Double-Stranded RNA Sequencing Reveals the Functional Significance of Base-Paired RNAs in Arabidopsis , 2010, PLoS genetics.

[18]  Cole Trapnell,et al.  Multiplexed RNA structure characterization with selective 2′-hydroxyl acylation analyzed by primer extension sequencing (SHAPE-Seq) , 2011, Proceedings of the National Academy of Sciences.

[19]  G. K. Wong,et al.  Most of the human genome is transcribed. , 2001, Genome research.

[20]  Kevin M. Weeks,et al.  Erratum: Selective 2′-hydroxyl acylation analyzed by primer extension (SHAPE): Quantitative RNA structure analysis at single nucleotide resolution (Nature Protocols (2006) 10.1038/nprot.2006.249) , 2006 .

[21]  Elena Rivas,et al.  Secondary structure alone is generally not statistically significant for the detection of noncoding RNAs , 2000, Bioinform..

[22]  William Stafford Noble,et al.  Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project , 2007, Nature.

[23]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[24]  Data production leads,et al.  An integrated encyclopedia of DNA elements in the human genome , 2012 .

[25]  D. Haussler,et al.  FragSeq: transcriptome-wide RNA structure probing using high-throughput sequencing , 2010, Nature Methods.

[26]  Juan Valcárcel,et al.  A simple principle to explain the evolution of pre-mRNA splicing. , 2006, Genes & development.

[27]  D. Mathews,et al.  Accurate SHAPE-directed RNA structure determination , 2009, Proceedings of the National Academy of Sciences.

[28]  B Lucks Julius,et al.  プライマー伸長塩基配列決定法(SHAPE‐Seq)を用いて分析した選択的2′‐ヒドロキシルアシル化による多重RNA構造の特徴化 , 2011 .

[29]  Paul Ryvkin,et al.  Global analysis of RNA secondary structure in two metazoans. , 2012, Cell reports.

[30]  Huei-Hun Tseng,et al.  Finding Non-coding RNAs Through Genome-Scale Clustering , 2008, APBC.

[31]  Sean R. Eddy,et al.  Rfam 11.0: 10 years of RNA families , 2012, Nucleic Acids Res..

[32]  Eric P. Nawrocki,et al.  Structural rna homology search and alignment using covariance models , 2009 .

[33]  Christine E. Heitsch,et al.  Evaluating the accuracy of SHAPE-directed RNA secondary structure predictions , 2013, Nucleic acids research.

[34]  Roded Sharan,et al.  A sequence-based filtering method for ncRNA identification and its application to searching for riboswitch elements , 2006, ISMB.

[35]  Sean R. Eddy,et al.  Infernal 1.1: 100-fold faster RNA homology searches , 2013, Bioinform..

[36]  David Haussler,et al.  Identification and Classification of Conserved RNA Secondary Structures in the Human Genome , 2006, PLoS Comput. Biol..

[37]  Shaojie Zhang,et al.  Incorporating phylogenetic-based covarying mutations into RNAalifold for RNA consensus structure prediction , 2013, BMC Bioinformatics.

[38]  J. Doudna,et al.  Ribozyme structures and mechanisms. , 2000, Annual review of biochemistry.

[39]  Howard Y. Chang,et al.  Genome-wide measurement of RNA secondary structure in yeast , 2010, Nature.

[40]  Vineet Bafna,et al.  FastR: fast database search tool for non-coding RNA , 2004, Proceedings. 2004 IEEE Computational Systems Bioinformatics Conference, 2004. CSB 2004..

[41]  David H. Mathews,et al.  RNAstructure: software for RNA secondary structure prediction and analysis , 2010, BMC Bioinformatics.

[42]  Walter Fontana,et al.  Fast folding and comparison of RNA secondary structures , 1994 .

[43]  Thomas R. Gingeras,et al.  Non-polyadenylated transcription in embryonic stem cells reveals novel non-coding RNA related to pluripotency and differentiation , 2013, Nucleic acids research.

[44]  Mihaela Zavolan,et al.  The snoRNA MBII-52 (SNORD 115) is processed into smaller RNAs and regulates alternative splicing. , 2010, Human molecular genetics.

[45]  Monir Hajiaghayi,et al.  Analysis of energy-based algorithms for RNA secondary structure prediction , 2012, BMC Bioinformatics.

[46]  S. Eddy Non–coding RNA genes and the modern RNA world , 2001, Nature Reviews Genetics.

[47]  Sean R. Eddy,et al.  RSEARCH: Finding homologs of single structured RNA sequences , 2003, BMC Bioinformatics.

[48]  J. Bachellerie,et al.  SnoRNA-guided ribose methylation of rRNA: structural features of the guide RNA duplex influencing the extent of the reaction. , 1998, Nucleic acids research.