RNA CoMPASS: A Dual Approach for Pathogen and Host Transcriptome Analysis of RNA-Seq Datasets

High-throughput RNA sequencing (RNA-seq) has become an instrumental assay for the analysis of multiple aspects of an organism's transcriptome. Further, the analysis of a biological specimen's associated microbiome can also be performed using RNA-seq data and this application is gaining interest in the scientific community. There are many existing bioinformatics tools designed for analysis and visualization of transcriptome data. Despite the availability of an array of next generation sequencing (NGS) analysis tools, the analysis of RNA-seq data sets poses a challenge for many biomedical researchers who are not familiar with command-line tools. Here we present RNA CoMPASS, a comprehensive RNA-seq analysis pipeline for the simultaneous analysis of transcriptomes and metatranscriptomes from diverse biological specimens. RNA CoMPASS leverages existing tools and parallel computing technology to facilitate the analysis of even very large datasets. RNA CoMPASS has a web-based graphical user interface with intrinsic queuing to control a distributed computational pipeline. RNA CoMPASS was evaluated by analyzing RNA-seq data sets from 45 B-cell samples. Twenty-two of these samples were derived from lymphoblastoid cell lines (LCLs) generated by the infection of naïve B-cells with the Epstein Barr virus (EBV), while another 23 samples were derived from Burkitt's lymphomas (BL), some of which arose in part through infection with EBV. Appropriately, RNA CoMPASS identified EBV in all LCLs and in a fraction of the BLs. Cluster analysis of the human transcriptome component of the RNA CoMPASS output clearly separated the BLs (which have a germinal center-like phenotype) from the LCLs (which have a blast-like phenotype) with evidence of activated MYC signaling and lower interferon and NF-kB signaling in the BLs. Together, this analysis illustrates the utility of RNA CoMPASS in the simultaneous analysis of transcriptome and metatranscriptome data. RNA CoMPASS is freely available at http://rnacompass.sourceforge.net/.

[1]  David A. Nix,et al.  Empirical methods for controlling false positives and estimating confidence in ChIP-Seq peaks , 2008, BMC Bioinformatics.

[2]  A. Westermann,et al.  Dual RNA-seq of pathogen and host , 2012, Nature Reviews Microbiology.

[3]  L. Staudt,et al.  Burkitt lymphoma pathogenesis and therapeutic targets from structural and functional genomics , 2012, Nature.

[4]  Elliott Kieff,et al.  Role of NF-κB in Cell Survival and Transcription of Latent Membrane Protein 1-Expressing or Epstein-Barr Virus Latency III-Infected Cells , 2004, Journal of Virology.

[5]  Lior Pachter,et al.  Sequence Analysis , 2020, Definitions.

[6]  S. Schuster,et al.  Integrative analysis of environmental sequences using MEGAN4. , 2011, Genome research.

[7]  Mark D. Robinson,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[8]  René L. Warren,et al.  The Sensitivity of Massively Parallel Sequencing for Detecting Candidate Infectious Agents Associated with Human Tissue , 2011, PloS one.

[9]  R A Olshen,et al.  Prognostic significance of actual dose intensity in diffuse large-cell lymphoma: results of a tree-structured survival analysis. , 1990, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[10]  G. Bornkamm,et al.  c-MYC impairs immunogenicity of human B cells. , 2007, Advances in cancer research.

[11]  Christopher M. Taylor,et al.  Differences in Gastric Carcinoma Microenvironment Stratify According to EBV Infection Intensity: Implications for Possible Immune Adjuvant Therapy , 2013, PLoS pathogens.

[12]  Christopher M. Taylor,et al.  PARSES: A Pipeline for Analysis of RNA-Seq Exogenous Sequences , 2011, BICoB.

[13]  G. Getz,et al.  PathSeq: software to identify or discover microbes by deep sequencing of human tissue , 2011, Nature Biotechnology.

[14]  Richard A. Moore,et al.  Fusobacterium nucleatum infection is prevalent in human colorectal carcinoma. , 2012, Genome research.

[15]  M. Rowe,et al.  The Epstein-Barr virus latent membrane protein-1 (LMP1) mediates activation of NF-kappa B and cell surface phenotype via two effector regions in its carboxy-terminal cytoplasmic domain. , 1995, Oncogene.

[16]  Richard Simon,et al.  Molecular diagnosis of Burkitt's lymphoma. , 2006, The New England journal of medicine.

[17]  N. Deng,et al.  Isoform-level microRNA-155 target prediction using RNA-seq , 2011, Nucleic acids research.

[18]  B. Thiers,et al.  Clonal Integration of a Polyomavirus in Human Merkel Cell Carcinoma , 2009 .

[19]  R. Bociek,et al.  High dose intensity doxorubicin in aggressive non-Hodgkin's lymphoma: a literature-based meta-analysis. , 2010, Annals of oncology : official journal of the European Society for Medical Oncology.

[20]  Guorong Xu,et al.  iQuant: A fast yet accurate GUI tool for transcript quantification , 2011, 2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW).

[21]  R. Spang,et al.  A biologic definition of Burkitt's lymphoma from transcriptional and genomic profiling. , 2006, The New England journal of medicine.

[22]  P. Benos,et al.  Human Transcriptome Subtraction by Using Short Sequence Tags To Search for Tumor Viruses in Conjunctival Carcinoma , 2007, Journal of Virology.

[23]  W. Richard McCombie,et al.  Topoisomerase levels determine chemotherapy response in vitro and in vivo , 2008, Proceedings of the National Academy of Sciences.

[24]  Inanç Birol,et al.  De novo transcriptome assembly with ABySS , 2009, Bioinform..

[25]  T. Samuelsson,et al.  The landscape of viral expression and host gene fusion and adaptation in human cancer , 2013, Nature Communications.

[26]  Tatiana A. Tatusova,et al.  NCBI Reference Sequences (RefSeq): current status, new features and genome annotation policy , 2011, Nucleic Acids Res..

[27]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[28]  M. Eilers,et al.  Transcriptional regulation and transformation by Myc proteins , 2005, Nature Reviews Molecular Cell Biology.

[29]  Sheila Dodge,et al.  Pathogen discovery from human tissue by sequence-based computational subtraction. , 2003, Genomics.

[30]  Michael Q. Zhang,et al.  A global transcriptional regulatory role for c-Myc in Burkitt's lymphoma cells , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[31]  D. Hedges,et al.  Identification of New Viral Genes and Transcript Isoforms during Epstein-Barr Virus Reactivation using RNA-Seq , 2011, Journal of Virology.

[32]  L. Staudt,et al.  c-Myc and Rel/NF-κB Are the Two Master Transcriptional Systems Activated in the Latency III Program of Epstein-Barr Virus-Immortalized B Cells , 2009, Journal of Virology.

[33]  Jay Shendure,et al.  Identification of foreign gene sequences by transcript filtering against the human genome , 2002, Nature Genetics.

[34]  Guorong Xu,et al.  SAMMate: a GUI tool for processing short read alignments in SAM/BAM format , 2011, Source Code for Biology and Medicine.

[35]  U. Weidle,et al.  The transcriptional program of a human B cell line in response to Myc. , 2001, Nucleic acids research.