Hecatomb: An End-to-End Research Platform for Viral Metagenomics

1 Background: Analysis of viral diversity using modern sequencing technologies offers 2 extraordinary opportunities for discovery. However, these analyses present a number of 3 bioinformatic challenges due to viral genetic diversity and virome complexity. Due to the 4 lack of conserved marker sequences, metagenomic detection of viral sequences requires 5 a non-targeted, random (shotgun) approach. Annotation and enumeration of viral 6 sequences relies on rigorous quality control and effective search strategies against 7 appropriate reference databases. Virome analysis also benefits from the analysis of both 8 individual metagenomic sequences as well as assembled contigs. Combined, virome 9 analysis results in large amounts of data requiring sophisticated visualization and 10 statistical tools. 11 Results: Here we introduce Hecatomb, a bioinformatics platform enabling both read and 12 contig based analysis. Hecatomb integrates query information from both amino acid and 13 nucleotide reference sequence databases. Hecatomb integrates data collected 14 throughout the workflow enabling analyst driven virome analysis and discovery. 15 Hecatomb is available on GitHub at https://github.com/shandley/hecatomb. 16 Conclusions: Hecatomb provides a single, modular software solution to the complex tasks required of many virome analysis. We demonstrate the value of the approach by applying Hecatomb to both a host-associated (enteric) and an environmental (marine) virome data set. Hecatomb provided data to determine true- or false-positive viral sequences in both data sets and revealed complex virome structure at distinct marine 21 reef sites.

[1]  M. Roach,et al.  The human gut virome: composition, colonization, interactions, and impacts on human health , 2023, Frontiers in Microbiology.

[2]  E. Dinsdale,et al.  Phage Diving: An Exploration of the Carcharhinid Shark Epidermal Virome , 2022, Viruses.

[3]  R. Edwards,et al.  How Metagenomics Has Transformed Our Understanding of Bacteriophages in Microbiome Research , 2022, Microorganisms.

[4]  J. Rioux,et al.  Human enteric viruses autonomously shape inflammatory bowel disease phenotype through divergent innate immunomodulation , 2022, Science Immunology.

[5]  Rachel Rodgers,et al.  Enteric virome negatively affects seroconversion following oral rotavirus vaccination in a longitudinally sampled cohort of Ghanaian infants , 2021, Cell host & microbe.

[6]  Natalia N. Ivanova,et al.  Metagenomic compendium of 189,680 DNA viruses from the human gut microbiome , 2021, Nature Microbiology.

[7]  Yvonne A. Evrard,et al.  TPM, FPKM, or Normalized Counts? A Comparative Study of Quantification Measures for the Analysis of RNA-seq Data from the NCI Patient-Derived Models Repository , 2021, Journal of Translational Medicine.

[8]  R. Edwards,et al.  Coral and Seawater Metagenomes Reveal Key Microbial Functions to Coral Health and Ecosystem Functioning Shaped at Reef Scale , 2021, Microbial Ecology.

[9]  Michael J. Tisza,et al.  A catalog of tens of thousands of viruses from human metagenomes reveals hidden associations with chronic diseases , 2021, Proceedings of the National Academy of Sciences.

[10]  R. Edwards,et al.  Philympics 2021: Prophage Predictions Perplex Programs , 2021, bioRxiv.

[11]  Wei Shen,et al.  TaxonKit: A practical and efficient NCBI taxonomy toolkit. , 2021, Journal of genetics and genomics = Yi chuan xue bao.

[12]  K. Olival,et al.  Ranking the risk of animal-to-human spillover for newly discovered viruses , 2021, Proceedings of the National Academy of Sciences.

[13]  F. Bushman,et al.  The human virome: assembly, composition and host interactions , 2021, Nature Reviews Microbiology.

[14]  Sven Rahmann,et al.  Sustainable data analysis with Snakemake , 2021, F1000Research.

[15]  Peter B. McGarvey,et al.  UniProt: the universal protein knowledgebase in 2021 , 2020, Nucleic Acids Res..

[16]  Chuan Ku,et al.  Host Range and Coding Potential of Eukaryotic Giant Viruses , 2020, Viruses.

[17]  Thomas L. Madden,et al.  Database resources of the National Center for Biotechnology Information , 2020, Nucleic Acids Res..

[18]  Kristen Kuhn,et al.  metaFlye: scalable long-read metagenome assembly using repeat graphs , 2020, Nature Methods.

[19]  R. Finn,et al.  Massive expansion of human gut bacteriophage diversity , 2020, Cell.

[20]  F. Bushman,et al.  The dynamics of the stool virome in very early onset inflammatory bowel disease. , 2020, Journal of Crohn's & colitis.

[21]  E. Koonin,et al.  Global Organization and Proposed Megataxonomy of the Virus World , 2020, Microbiology and Molecular Biology Reviews.

[22]  R. Edwards,et al.  Modeling of the Coral Microbiome: the Influence of Temperature and Microbial Network , 2020, mBio.

[23]  A. Mushegian,et al.  Are There 1031 Virus Particles on Earth, or More, or Fewer? , 2020, Journal of bacteriology.

[24]  Lu Sun,et al.  NCBI Taxonomy: a comprehensive update on curation, resources and tools , 2020, Database J. Biol. Databases Curation.

[25]  T. Sutton,et al.  Whole-Virome Analysis Sheds Light on Viral Dark Matter in Inflammatory Bowel Disease. , 2019, Cell host & microbe.

[26]  P. Turnbaugh,et al.  CRISPR-Cas System of a Prevalent Human Gut Bacterium Reveals Hyper-targeting against Phages in a Human Virome Catalog. , 2019, Cell host & microbe.

[27]  Alise J. Ponsero,et al.  The Promises and Pitfalls of Machine Learning for Detecting Viruses in Aquatic Metagenomes , 2019, Front. Microbiol..

[28]  J. Banfield,et al.  Metatranscriptomic reconstruction reveals RNA viruses with the potential to shape carbon cycling in soil , 2019, Proceedings of the National Academy of Sciences.

[29]  I-Min A. Chen,et al.  IMG/VR v.2.0: an integrated data management and analysis system for cultivated and environmental viral genomes , 2018, Nucleic Acids Res..

[30]  S. Cleaveland,et al.  Waves of endemic foot-and-mouth disease in eastern Africa suggest feasibility of proactive vaccination approaches , 2018, Nature Ecology & Evolution.

[31]  Changsheng Li,et al.  Host-linked soil viral ecology along a permafrost thaw gradient , 2018, Nature Microbiology.

[32]  Brent S. Pedersen,et al.  Bioconda: sustainable and comprehensive software distribution for the life sciences , 2018, Nature Methods.

[33]  S. Saleska,et al.  Soil Viruses Are Underexplored Players in Ecosystem Carbon Processing , 2018, mSystems.

[34]  Jia Gu,et al.  fastp: an ultra-fast all-in-one FASTQ preprocessor , 2018, bioRxiv.

[35]  Steven G. Cresawn,et al.  An inclusive Research Education Community (iREC): Impact of the SEA-PHAGES program on research outcomes and student learning , 2017, Proceedings of the National Academy of Sciences.

[36]  Heng Li,et al.  Minimap2: pairwise alignment for nucleotide sequences , 2017, Bioinform..

[37]  David Wang,et al.  Origins and challenges of viral dark matter. , 2017, Virus research.

[38]  Johannes Söding,et al.  MMseqs2: sensitive protein sequence searching for the analysis of massive data sets , 2017, bioRxiv.

[39]  Johannes Söding,et al.  Clustering huge protein sequence sets in linear time , 2017, Nature Communications.

[40]  Maria Jesus Martin,et al.  Uniclust databases of clustered and deeply annotated protein sequences and alignments , 2016, Nucleic Acids Res..

[41]  Yan Li,et al.  SeqKit: A Cross-Platform and Ultrafast Toolkit for FASTA/Q File Manipulation , 2016, PloS one.

[42]  S. Koren,et al.  Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation , 2016, bioRxiv.

[43]  H. Virgin,et al.  SIV Infection-Mediated Changes in Gastrointestinal Bacterial Microbiome and Virome Are Associated with Immunodeficiency and Prevented by Vaccination. , 2016, Cell host & microbe.

[44]  Douglas S Kwon,et al.  Altered Virome and Bacterial Microbiome in Human Immunodeficiency Virus-Associated Acquired Immunodeficiency Syndrome. , 2016, Cell host & microbe.

[45]  Deanna M. Church,et al.  Assembly: a resource for assembled genomes at NCBI , 2015, Nucleic Acids Res..

[46]  C. Wild,et al.  Nitrogen cycling in corals: the key to understanding holobiont functioning? , 2015, Trends in microbiology.

[47]  B. Korber,et al.  Construction and Evaluation of Novel Rhesus Monkey Adenovirus Vaccine Vectors , 2014, Journal of Virology.

[48]  Kunihiko Sadakane,et al.  MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph , 2014, Bioinform..

[49]  J. Derisi,et al.  Profile Hidden Markov Models for the Detection of Viruses within Metagenomic Sequence Data , 2014, PloS one.

[50]  K. De Clercq,et al.  False-positive results in metagenomic virus discovery: a strong case for follow-up diagnosis. , 2014, Transboundary and emerging diseases.

[51]  Brian Bushnell,et al.  BBMap: A Fast, Accurate, Splice-Aware Aligner , 2014 .

[52]  R. Ellison The Human Gut Virome , 2013 .

[53]  Jean-Michel Claverie,et al.  Exploring nucleo-cytoplasmic large DNA viruses in Tara Oceans microbial metagenomes , 2013, The ISME Journal.

[54]  Thomas L. Madden,et al.  BLAST+: architecture and applications , 2009, BMC Bioinformatics.

[55]  G. Kang,et al.  Human stool contains a previously unrecognized diversity of novel astroviruses , 2009, Virology Journal.

[56]  B. L. Patil,et al.  Cassava mosaic geminiviruses: actual knowledge and perspectives. , 2009, Molecular plant pathology.

[57]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[58]  C. Suttle Viruses in the sea , 2005, Nature.

[59]  D. Hutchins,et al.  Viral release of iron and its bioavailability to marine plankton , 2004 .

[60]  M. Weinbauer Ecology of prokaryotic viruses. , 2004, FEMS microbiology reviews.

[61]  H. Prempeh,et al.  Foot and mouth disease: the human consequences , 2001, BMJ : British Medical Journal.

[62]  K. Wommack,et al.  Virioplankton: Viruses in Aquatic Ecosystems , 2000, Microbiology and Molecular Biology Reviews.

[63]  C. Suttle,et al.  Viruses and Nutrient Cycles in the Sea Viruses play critical roles in the structure and function of aquatic food webs , 1999 .

[64]  J. Fuhrman Marine viruses and their biogeochemical and ecological effects , 1999, Nature.

[65]  W. Martin Bacteria-related sequences in a simian cytomegalovirus-derived stealth virus culture. , 1999, Experimental and molecular pathology.

[66]  R. Hendrix,et al.  Evolutionary relationships among diverse bacteriophages and prophages: all the world's a phage. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[67]  C. Ellenbogen The common cold. , 1981, American family physician.

[68]  J. Köster,et al.  Snakemake - a scalable bioinformatics workflow engine , 2018, Bioinform..

[69]  N. Jersey.,et al.  SENATE OF THE UNITED STATES , 2003 .

[70]  K. Anstey,et al.  Cc-by-nc-nd 4.0 International License , 2022 .