Mining RNA–Seq Data for Infections and Contaminations

RNA sequencing (RNA–seq) provides novel opportunities for transcriptomic studies at nucleotide resolution, including transcriptomics of viruses or microbes infecting a cell. However, standard approaches for mapping the resulting sequencing reads generally ignore alternative sources of expression other than the host cell and are little equipped to address the problems arising from redundancies and gaps among sequenced microbe and virus genomes. We show that screening of sequencing reads for contaminations and infections can be performed easily using ContextMap, our recently developed mapping software. Based on mapping–derived statistics, mapping confidence, similarities and misidentifications (e.g. due to missing genome sequences) of species/strains can be assessed. Performance of our approach is evaluated on three real–life sequencing data sets and compared to state–of–the–art metagenomics tools. In particular, ContextMap vastly outperformed GASiC and GRAMMy in terms of runtime. In contrast to MEGAN4, it was capable of providing individual read mappings to species and resolving non–unique mappings, thus allowing the identification of misalignments caused by sequence similarities between genomes and missing genome sequences. Our study illustrates the importance and potentials of routinely mining RNA–seq experiments for infections or contaminations by microbes and viruses. By using ContextMap, gene expression of infecting agents can be analyzed and novel insights in infection processes and tumorigenesis can be obtained.

[1]  D. Willner,et al.  Metagenomics and metatranscriptomics: windows on CF-associated viral and microbial communities. , 2013, Journal of cystic fibrosis : official journal of the European Cystic Fibrosis Society.

[2]  Thomas Bonfert,et al.  Real-time Transcriptional Profiling of Cellular and Viral Gene Expression during Lytic Cytomegalovirus Infection , 2012, PLoS pathogens.

[3]  Li C. Xia,et al.  Accurate Genome Relative Abundance Estimation Based on Shotgun Metagenomic Reads , 2011, PloS one.

[4]  Richard A. Moore,et al.  Fusobacterium nucleatum infection is prevalent in human colorectal carcinoma. , 2012, Genome research.

[5]  S. Salzberg,et al.  Phymm and PhymmBL: Metagenomic Phylogenetic Classification with Interpolated Markov Models , 2009, Nature Methods.

[6]  Cole Trapnell,et al.  TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions , 2013, Genome Biology.

[7]  H. Hausen Papillomaviruses and cancer: from basic studies to clinical application , 2002, Nature Reviews Cancer.

[8]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[9]  K. Yu,et al.  Metagenomic and Metatranscriptomic Analysis of Microbial Community Structure and Gene Expression of Activated Sludge , 2012, PloS one.

[10]  F Pfeiffer,et al.  Evolution in the laboratory: the genome of Halobacterium salinarum strain R1 compared to that of strain NRC-1. , 2008, Genomics.

[11]  Lu Wang,et al.  The NIH Human Microbiome Project. , 2009, Genome research.

[12]  J. Peto,et al.  Human papillomavirus is a necessary cause of invasive cervical cancer worldwide , 1999, The Journal of pathology.

[13]  Nicholas T. Ingolia,et al.  Mammalian microRNAs predominantly act to decrease target mRNA levels , 2010, Nature.

[14]  K. Konstantinidis,et al.  The bacterial species definition in the genomic era , 2006, Philosophical Transactions of the Royal Society B: Biological Sciences.

[15]  Chuan Yi Tang,et al.  RNASEQR—a streamlined and accurate RNA-seq sequence analysis program , 2011, Nucleic acids research.

[16]  F. Yu,et al.  Metatranscriptomics and Pyrosequencing Facilitate Discovery of Potential Viral Natural Enemies of the Invasive Caribbean Crazy Ant, Nylanderia pubens , 2012, PloS one.

[17]  Richard Durbin,et al.  Fast and accurate long-read alignment with Burrows–Wheeler transform , 2010, Bioinform..

[18]  Mihai Pop,et al.  Accurate and fast estimation of taxonomic profiles from metagenomic shotgun sequences , 2011, Genome Biology.

[19]  D. DiMaio,et al.  Primary human cervical carcinoma cells require human papillomavirus E6 and E7 expression for ongoing proliferation. , 2012, Virology.

[20]  Matthew Horton,et al.  MARTA: a suite of Java-based tools for assigning taxonomic status to DNA sequences , 2010, Bioinform..

[21]  Lior Pachter,et al.  Sequence Analysis , 2020, Definitions.

[22]  P. Howley,et al.  Brd4-Independent Transcriptional Repression Function of the Papillomavirus E2 Proteins , 2007, Journal of Virology.

[23]  S. Schuster,et al.  Integrative analysis of environmental sequences using MEGAN4. , 2011, Genome research.

[24]  Brian P. Brunk,et al.  Comparative analysis of RNA-Seq alignment algorithms and the RNA-Seq unified mapper (RUM) , 2011, Bioinform..

[25]  Andreas Wilke,et al.  phylogenetic and functional analysis of metagenomes , 2022 .

[26]  Dominik Endres,et al.  A new metric for probability distributions , 2003, IEEE Transactions on Information Theory.

[27]  Folker Meyer,et al.  37. The Metagenomics RAST Server: A Public Resource for the Automatic Phylogenetic and Functional Analysis of Metagenomes , 2011 .

[28]  Bernhard Y. Renard,et al.  Metagenomic abundance estimation and diagnostic testing on species level , 2012, Nucleic acids research.

[29]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[30]  T. Sugimura,et al.  Nucleotide sequences of cDNAs for human papillomavirus type 18 transcripts in HeLa cells , 1988, Journal of virology.

[31]  Jianhua Lin,et al.  Divergence measures based on the Shannon entropy , 1991, IEEE Trans. Inf. Theory.

[32]  I. Rigoutsos,et al.  Accurate phylogenetic classification of variable-length DNA fragments , 2007, Nature Methods.

[33]  Lenwood S. Heath,et al.  ClaMS: A Classifier for Metagenomic Sequences , 2011, Standards in genomic sciences.

[34]  A. Stamatakis,et al.  MLTreeMap - accurate Maximum Likelihood placement of environmental DNA sequences into taxonomic and functional reference phylogenies , 2010, BMC Genomics.

[35]  G. Getz,et al.  PathSeq: software to identify or discover microbes by deep sequencing of human tissue , 2011, Nature Biotechnology.

[36]  John Parkinson,et al.  Generation and Analysis of a Mouse Intestinal Metatranscriptome through Illumina Based RNA-Sequencing , 2012, PloS one.

[37]  Thomas Bonfert,et al.  A context-based approach to identify the most likely mapping for RNA-Seq experiments , 2012 .

[38]  Monzoorul Haque Mohammed,et al.  SOrt-ITEMS: Sequence orthology based approach for improved taxonomic estimation of metagenomic sequences , 2009, Bioinform..

[39]  René L. Warren,et al.  The Sensitivity of Massively Parallel Sequencing for Detecting Candidate Infectious Agents Associated with Human Tissue , 2011, PloS one.

[40]  A. Westermann,et al.  Dual RNA-seq of pathogen and host , 2012, Nature Reviews Microbiology.

[41]  Derek Y. Chiang,et al.  MapSplice: Accurate mapping of RNA-seq reads for splice junction discovery , 2010, Nucleic acids research.