The VirAnnot Pipeline: A Resource for Automated Viral Diversity Estimation and Operational Taxonomy Units Assignation for Virome Sequencing Data

Viral metagenomics relies on high-throughput sequencing and on bioinformatic analyses to access the genetic content and diversity of entire viral communities. No universally accepted strategy or tool currently exists to define operational taxonomy units (OTUs) and evaluate viral alpha or beta diversity from virome data. Here we present a new bioinformatic resource, the VirAnnot (automated viral diversity estimation) pipeline, which performs the automated identification of OTUs. Reverse-position-specific BLAST (RPS-Blastn) is used to detect conserved viral protein motifs. The corresponding contigs are then aligned and a clustering approach is used to group in the same OTU contigs sharing more than a set identity threshold. A 10% threshold has been validated as producing OTUs that reasonably approach, in many families, the International Committee for the Taxonomy of Viruses taxonomy and can therefore be used as a proxy to viral species.