A context-based approach to identify the most likely mapping for RNA-seq experiments

BackgroundSequencing of mRNA (RNA-seq) by next generation sequencing technologies is widely used for analyzing the transcriptomic state of a cell. Here, one of the main challenges is the mapping of a sequenced read to its transcriptomic origin. As a simple alignment to the genome will fail to identify reads crossing splice junctions and a transcriptome alignment will miss novel splice sites, several approaches have been developed for this purpose. Most of these approaches have two drawbacks. First, each read is assigned to a location independent on whether the corresponding gene is expressed or not, i.e. information from other reads is not taken into account. Second, in case of multiple possible mappings, the mapping with the fewest mismatches is usually chosen which may lead to wrong assignments due to sequencing errors.ResultsTo address these problems, we developed ContextMap which efficiently uses information on the context of a read, i.e. reads mapping to the same expressed region. The context information is used to resolve possible ambiguities and, thus, a much larger degree of ambiguities can be allowed in the initial stage in order to detect all possible candidate positions. Although ContextMap can be used as a stand-alone version using either a genome or transcriptome as input, the version presented in this article is focused on refining initial mappings provided by other mapping algorithms. Evaluation results on simulated sequencing reads showed that the application of ContextMap to either TopHat or MapSplice mappings improved the mapping accuracy of both initial mappings considerably.ConclusionsIn this article, we show that the context of reads mapping to nearby locations provides valuable information for identifying the best unique mapping for a read. Using our method, mappings provided by other state-of-the-art methods can be refined and alignment accuracy can be further improved.Availabilityhttp://www.bio.ifi.lmu.de/ContextMap.

[1]  Ion I. Mandoiu,et al.  Estimation of alternative splicing isoform frequencies from RNA-Seq data , 2010, Algorithms for Molecular Biology.

[2]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[3]  Brian P. Brunk,et al.  Comparative analysis of RNA-Seq alignment algorithms and the RNA-Seq unified mapper (RUM) , 2011, Bioinform..

[4]  Lior Pachter,et al.  Sequence Analysis , 2020, Definitions.

[5]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[6]  Marcel H. Schulz,et al.  A Global View of Gene Activity and Alternative Splicing by Deep Sequencing of the Human Transcriptome , 2008, Science.

[7]  B. Frey,et al.  Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing , 2008, Nature Genetics.

[8]  B. Williams,et al.  Mapping and quantifying mammalian transcriptomes by RNA-Seq , 2008, Nature Methods.

[9]  Eric T. Wang,et al.  Alternative Isoform Regulation in Human Tissue Transcriptomes , 2008, Nature.

[10]  P. Green,et al.  Massively parallel sequencing of the polyadenylated transcriptome of C. elegans. , 2009, Genome research.

[11]  Eran Halperin,et al.  Accurate Estimation of Expression Levels of Homologous Genes in RNA-seq Experiments , 2010, RECOMB.

[12]  김삼묘,et al.  “Bioinformatics” 특집을 내면서 , 2000 .

[13]  Colin N. Dewey,et al.  RNA-Seq gene expression estimation with read mapping uncertainty , 2009, Bioinform..

[14]  Thomas Bonfert,et al.  A context-based approach to identify the most likely mapping for RNA-Seq experiments , 2012 .

[15]  Sean M. Grimmond,et al.  RNA-MATE: a recursive mapping strategy for high-throughput RNA-sequencing data , 2009, Bioinform..

[16]  Chuan Yi Tang,et al.  RNASEQR—a streamlined and accurate RNA-seq sequence analysis program , 2011, Nucleic acids research.

[17]  Brian E. Howard,et al.  Towards reliable isoform quantification using RNA-SEQ data , 2009, 2009 IEEE International Conference on Bioinformatics and Biomedicine.

[18]  Derek Y. Chiang,et al.  MapSplice: Accurate mapping of RNA-seq reads for splice junction discovery , 2010, Nucleic acids research.

[19]  Giovanni Manzini,et al.  Opportunistic data structures with applications , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.