UMARS: Un-MAppable Reads Solution

BackgroundUn-MAppable Reads Solution (UMARS) is a user-friendly web service focusing on retrieving valuable information from sequence reads that cannot be mapped back to reference genomes. Recently, next-generation sequencing (NGS) technology has emerged as a powerful tool for generating high-throughput sequencing data and has been applied to many kinds of biological research. In a typical analysis, adaptor-trimmed NGS reads were first mapped back to reference sequences, including genomes or transcripts. However, a fraction of NGS reads failed to be mapped back to the reference sequences. Such un-mappable reads are usually imputed to sequencing errors and discarded without further consideration.MethodsWe are investigating possible biological relevance and possible sources of un-mappable reads. Therefore, we developed UMARS to scan for virus genomic fragments or exon-exon junctions of novel alternative splicing isoforms from un-mappable reads. For mapping un-mappable reads, we first collected viral genomes and sequences of exon-exon junctions. Then, we constructed UMARS pipeline as an automatic alignment interface.ResultsBy demonstrating the results of two UMARS alignment cases, we show the applicability of UMARS. We first showed that the expected EBV genomic fragments can be detected by UMARS. Second, we also detected exon-exon junctions from un-mappable reads. Further experimental validation also ensured the authenticity of the UMARS pipeline. The UMARS service is freely available to the academic community and can be accessed via http://musk.ibms.sinica.edu.tw/UMARS/.ConclusionsIn this study, we have shown that some un-mappable reads are not caused by sequencing errors. They can originate from viral infection or transcript splicing. Our UMARS pipeline provides another way to examine and recycle the un-mappable reads that are commonly discarded as garbage.

[1]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[2]  Timothy B. Stockwell,et al.  Evaluation of next generation sequencing platforms for population targeted sequencing studies , 2009, Genome Biology.

[3]  Benjamin M. Wheeler,et al.  The deep evolution of metazoan microRNAs , 2009, Evolution & development.

[4]  R. Durbin,et al.  Mapping Quality Scores Mapping Short Dna Sequencing Reads and Calling Variants Using P

, 2022 .

[5]  J. Handelsman Metagenomics: Application of Genomics to Uncultured Microorganisms , 2004, Microbiology and Molecular Biology Reviews.

[6]  R. Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[7]  Yufeng Shen,et al.  Comparing Platforms for C. elegans Mutant Identification Using High-Throughput Whole-Genome Sequencing , 2008, PloS one.

[8]  K. Reinert,et al.  RazerS--fast read mapping with sensitivity control. , 2009, Genome research.

[9]  J. Wain,et al.  High-throughput sequencing provides insights into genome variation and evolution in Salmonella Typhi , 2008, Nature Genetics.

[10]  Reinhard Simon,et al.  Complete viral genome sequence and discovery of novel viruses by deep sequencing of small RNAs: a generic method for diagnosis, discovery and sequencing of viruses. , 2009, Virology.

[11]  Gabor T. Marth,et al.  Whole-genome sequencing and variant discovery in C. elegans , 2008, Nature Methods.

[12]  David Haussler,et al.  The UCSC Genome Browser database: update 2010 , 2009, Nucleic Acids Res..

[13]  Eric C Lai,et al.  Virus discovery by deep sequencing and assembly of virus-derived small silencing RNAs , 2010, Proceedings of the National Academy of Sciences.

[14]  M. Samoszuk,et al.  Clonogenic growth of human breast cancer cells co-cultured in direct contact with serum-activated fibroblasts , 2005, Breast Cancer Research.

[15]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[16]  Wing Hung Wong,et al.  SeqMap: mapping massive amount of oligonucleotides to the genome , 2008, Bioinform..

[17]  Mihaela Zavolan,et al.  Virus-encoded microRNAs: novel regulators of gene expression. , 2006, Trends in microbiology.

[18]  Siu-Ming Yiu,et al.  SOAP2: an improved ultrafast tool for short read alignment , 2009, Bioinform..

[19]  Chun-Nan Hsu,et al.  Identification of homologous microRNAs in 56 animal genomes. , 2010, Genomics.

[20]  E. Mardis,et al.  Transcriptome-Wide Identification of Novel Imprinted Genes in Neonatal Mouse Brain , 2008, PloS one.

[21]  Yan Long,et al.  Single nucleotide polymorphism (SNP) discovery in the polyploid Brassica napus using Solexa transcriptome sequencing. , 2009, Plant biotechnology journal.

[22]  Ryan D. Morin,et al.  Application of massively parallel sequencing to microRNA profiling and discovery in human embryonic stem cells. , 2008, Genome research.

[23]  B. Cullen Viruses and microRNAs , 2006, Nature Genetics.

[24]  Xi Chen,et al.  Identification and characterization of novel amphioxus microRNAs by Solexa sequencing , 2009, Genome Biology.

[25]  Lior Pachter,et al.  Sequence Analysis , 2020, Definitions.

[26]  James F Battey,et al.  Signatures from tissue-specific MPSS libraries identify transcripts preferentially expressed in the mouse inner ear. , 2007, Genomics.

[27]  Mary Goldman,et al.  The UCSC Genome Browser database: update 2011 , 2010, Nucleic Acids Res..

[28]  Weihong Qi,et al.  Genomic Diversity and Evolution of Mycobacterium ulcerans Revealed by Next-Generation Sequencing , 2009, PLoS pathogens.

[29]  X Wu,et al.  Molecular cloning, sequencing, and characterization of smooth muscle myosin alkali light chain from human eye cDNA: homology with myocardial fatty acid ethyl ester synthase-III cDNA. , 1994, Genomics.

[30]  Hunter B. Fraser,et al.  Ab initio construction of a eukaryotic transcriptome by massively parallel mRNA sequencing , 2009, Proceedings of the National Academy of Sciences.

[31]  Robert J. Moore,et al.  A microRNA catalog of the developing chicken embryo identified by a deep sequencing approach. , 2008, Genome research.

[32]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[33]  Tatiana A. Tatusova,et al.  NCBI Reference Sequences: current status, policy and new initiatives , 2008, Nucleic Acids Res..

[34]  Wolfgang Gerlach,et al.  WebCARMA: a web application for the functional and taxonomic classification of unassembled metagenomic reads , 2009, BMC Bioinformatics.

[35]  Wen-chang Lin,et al.  Vir-Mir db: prediction of viral microRNA candidate hairpins , 2007, Nucleic Acids Res..