VIGA: an one-stop tool for eukaryotic Virus Identification and Genome Assembly from next-generation-sequencing data

Identification of viruses and further assembly of viral genomes from the next-generation-sequencing (NGS) data are essential steps in virome studies. This study presented an one-stop tool named VIGA (available at https://github.com/viralInformatics/VIGA) for eukaryotic virus identification and genome assembly from NGS data. It was composed of four modules including identification, taxonomic annotation, assembly and novel virus discovery which integrated the homology-based method for virus identification and both the reference-based and de novo assemblers for accurate and effective assembly of virus genomes. Evaluation on multiple simulated and real virome datasets showed that VIGA assembled more complete virus genomes than its competitors on both the metatranscriptomic and metagenomic data, and also performed well in assembling virus genomes at the strain level. Finally, VIGA was used to investigate the virome in metatranscriptomic data from the Human Microbiome Project and revealed different composition and positive rate of viromes in diseases of Prediabetes, Crohn’s disease and Ulcerative colitis. Overall, VIGA would help much in identification and characterization of viromes in future studies.

[1]  Kenneth E. Schackart,et al.  Evaluation of computational phage detection tools for metagenomic datasets , 2023, Frontiers in Microbiology.

[2]  M. Schatz,et al.  Automated assembly scaffolding using RagTag elevates a new tomato system for high-throughput genome editing , 2022, Genome biology.

[3]  E. Delwart,et al.  Virome in the cloaca of wild and breeding birds revealed a diversity of significant viruses , 2022, Microbiome.

[4]  Chaochao Yan,et al.  MEANGS: an efficient seed-free tool for de novo assembling animal mitochondrial genome using whole genome NGS data , 2021, Briefings Bioinform..

[5]  Fu Liu,et al.  Virtifier: a deep learning-based identifier for viral sequences from metagenomes , 2021, Bioinform..

[6]  Xiang Xiao,et al.  Diversity and distribution of viruses inhabiting the deepest ocean on Earth , 2021, The ISME Journal.

[7]  F. Klawonn,et al.  Haploflow: Strain-resolved de novo assembly of viral genomes , 2021, bioRxiv.

[8]  Yousong Peng,et al.  Prokaryotic virus host predictor: a Gaussian model for host prediction of prokaryotic viruses in metagenomics , 2020, bioRxiv.

[9]  Eugene V. Koonin,et al.  Seeker: Alignment-free identification of bacteriophage genomes by deep learning , 2020, bioRxiv.

[10]  E. Delong,et al.  Assembly-free single-molecule sequencing recovers complete virus genomes from natural microbial communities , 2020, Genome research.

[11]  W. Cho,et al.  Sweet potato viromes in eight different geographical regions in Korea and two different cultivars , 2020, Scientific Reports.

[12]  J. Pipas,et al.  Detecting viral sequences in NGS data. , 2019, Current opinion in virology.

[13]  T. Madden,et al.  Magic-BLAST, an accurate RNA-seq aligner for long and short reads , 2019, BMC Bioinformatics.

[14]  Jennifer M. Fettweis,et al.  The Integrative Human Microbiome Project , 2019, Nature.

[15]  G. Cochrane,et al.  Marine DNA Viral Macro- and Microdiversity from Pole to Pole , 2019, Cell.

[16]  Matthew B. Sullivan,et al.  Long-read viral metagenomics captures abundant and microdiverse viral populations and their niche-defining genomic islands , 2019 .

[17]  Martin Ayling,et al.  New approaches for metagenome assembly with short reads , 2019, Briefings Bioinform..

[18]  R. Bowden,et al.  Illumina and Nanopore methods for whole genome sequencing of hepatitis B virus (HBV) , 2018, bioRxiv.

[19]  N. Ajami,et al.  Maximal viral information recovery from sequence data using VirMAP , 2018, Nature Communications.

[20]  Jia Gu,et al.  fastp: an ultra-fast all-in-one FASTQ preprocessor , 2018, bioRxiv.

[21]  S. Koren,et al.  MetaCompass: Reference-guided Assembly of Metagenomes , 2017, bioRxiv.

[22]  Johannes Söding,et al.  MMseqs2: sensitive protein sequence searching for the analysis of massive data sets , 2017, bioRxiv.

[23]  Emiley A. Eloe-Fadrosh,et al.  Benchmarking viromics: an in silico evaluation of metagenome-enabled estimates of viral community composition and diversity , 2017, PeerJ.

[24]  Gennady G. Fedonin,et al.  VirGenA: a reference‐based assembler for variable viral genomes , 2017, Briefings Bioinform..

[25]  Yang Young Lu,et al.  VirFinder: a novel k-mer based tool for identifying viral sequences from assembled metagenomic data , 2017, Microbiome.

[26]  E. Kirkness,et al.  The blood DNA virome in 8,000 humans , 2017, PLoS pathogens.

[27]  Alexey A. Gurevich,et al.  MetaQUAST: evaluation of metagenome assemblies , 2016, Bioinform..

[28]  Tanja Woyke,et al.  Viral dark matter and virus–host interactions resolved from publicly available microbial genomes , 2015, eLife.

[29]  Chao Xie,et al.  Fast and sensitive protein alignment using DIAMOND , 2014, Nature Methods.

[30]  C. Ponting,et al.  Sequencing depth and coverage: key considerations in genomic analyses , 2014, Nature Reviews Genetics.

[31]  Birgit Eisenhaber,et al.  Powerful Sequence Similarity Search Methods and In-Depth Manual Analyses Can Identify Remote Homologs in Many Apparently “Orphan” Viral Proteins , 2013, Journal of Virology.

[32]  Colin N. Dewey,et al.  De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis , 2013, Nature Protocols.

[33]  Rasko Leinonen,et al.  The sequence read archive: explosive growth of sequencing data , 2011, Nucleic Acids Res..

[34]  Robert D. Finn,et al.  HMMER web server: interactive sequence similarity searching , 2011, Nucleic Acids Res..

[35]  Sanghyuk Lee,et al.  Accurate quantification of transcriptome from RNA-Seq data by effective length normalization , 2010, Nucleic Acids Res..

[36]  Steven J. M. Jones,et al.  De novo assembly and analysis of RNA-seq data , 2010, Nature Methods.

[37]  Andrew Rambaut,et al.  Evolutionary analysis of the dynamics of viral infectious disease , 2009, Nature Reviews Genetics.

[38]  Gabor T. Marth,et al.  Whole-genome sequencing and variant discovery in C. elegans , 2008, Nature Methods.

[39]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.