VFDB 2019: a comparative pathogenomic platform with an interactive web interface

Abstract The virulence factor database (VFDB, http://www.mgc.ac.cn/VFs/) is devoted to providing the scientific community with a comprehensive warehouse and online platform for deciphering bacterial pathogenesis. The various combinations, organizations and expressions of virulence factors (VFs) are responsible for the diverse clinical symptoms of pathogen infections. Currently, whole-genome sequencing is widely used to decode potential novel or variant pathogens both in emergent outbreaks and in routine clinical practice. However, the efficient characterization of pathogenomic compositions remains a challenge for microbiologists or physicians with limited bioinformatics skills. Therefore, we introduced to VFDB an integrated and automatic pipeline, VFanalyzer, to systematically identify known/potential VFs in complete/draft bacterial genomes. VFanalyzer first constructs orthologous groups within the query genome and preanalyzed reference genomes from VFDB to avoid potential false positives due to paralogs. Then, it conducts iterative and exhaustive sequence similarity searches among the hierarchical prebuilt datasets of VFDB to accurately identify potential untypical/strain-specific VFs. Finally, via a context-based data refinement process for VFs encoded by gene clusters, VFanalyzer can achieve relatively high specificity and sensitivity without manual curation. In addition, a thoroughly optimized interactive web interface is introduced to present VFanalyzer reports in comparative pathogenomic style for easy online analysis.

[1]  Lingyun Zou,et al.  Accurate prediction of bacterial type IV secreted effectors using amino acid composition and PSSM profiles , 2013, Bioinform..

[2]  Jun Yu,et al.  VFDB 2008 release: an enhanced web-based resource for comparative pathogenomics , 2007, Nucleic Acids Res..

[3]  Dinesh Gupta,et al.  VirulentPred: a SVM based prediction method for virulent proteins in bacterial pathogens , 2008, BMC Bioinformatics.

[4]  Ole Lund,et al.  Real-Time Whole-Genome Sequencing for Routine Typing, Surveillance, and Outbreak Detection of Verotoxigenic Escherichia coli , 2014, Journal of Clinical Microbiology.

[5]  Jian Yang,et al.  DBatVir: the database of bat-associated viruses , 2014, Database J. Biol. Databases Curation.

[6]  Jeannette Adu-Bobie,et al.  NarE: a novel ADP‐ribosyltransferase from Neisseria meningitidis , 2003, Molecular microbiology.

[7]  James H. Bullard,et al.  Origins of the E. coli strain causing an outbreak of hemolytic-uremic syndrome in Germany. , 2011, The New England journal of medicine.

[8]  Jun Li,et al.  VRprofile: gene‐cluster‐detection‐based profiling of virulence and antibiotic resistance traits encoded within genome sequences of pathogenic bacteria , 2017, Briefings Bioinform..

[9]  Robert D. Finn,et al.  The Pfam protein families database: towards a more sustainable future , 2015, Nucleic Acids Res..

[10]  W. Fitch Homology a personal view on some of the problems. , 2000, Trends in genetics : TIG.

[11]  D. Metzger,et al.  Host–pathogen interactions and immune evasion strategies in Francisella tularensis pathogenicity , 2014, Infection and drug resistance.

[12]  Steven Salzberg,et al.  Identifying bacterial genes and endosymbiont DNA with Glimmer , 2007, Bioinform..

[13]  D. Haft,et al.  Using comparative genomics to drive new discoveries in microbiology. , 2015, Current opinion in microbiology.

[14]  Qi Jin,et al.  DRodVir: A resource for exploring the virome diversity in rodents , 2017, Journal of Genetics and Genomics.

[15]  Jian Yang,et al.  VFDB 2012 update: toward the genetic diversity and molecular evolution of bacterial virulence factors , 2011, Nucleic Acids Res..

[16]  Junhua Li,et al.  Open-source genomic analysis of Shiga-toxin-producing E. coli O104:H4. , 2011, The New England journal of medicine.

[17]  R. Podschun,et al.  Klebsiella spp. as Nosocomial Pathogens: Epidemiology, Taxonomy, Typing Methods, and Pathogenicity Factors , 1998, Clinical Microbiology Reviews.

[18]  C. Stoeckert,et al.  OrthoMCL: identification of ortholog groups for eukaryotic genomes. , 2003, Genome research.

[19]  Thomas Rattei,et al.  Sequence-Based Prediction of Type III Secreted Proteins , 2009, PLoS pathogens.

[20]  Rida Assaf,et al.  Improvements to PATRIC, the all-bacterial Bioinformatics Database and Analysis Resource Center , 2016, Nucleic Acids Res..

[21]  S. Eddy,et al.  Challenges in homology search: HMMER3 and convergent evolution of coiled-coil regions , 2013, Nucleic acids research.

[22]  Gajendra P. S. Raghava,et al.  VICMpred: An SVM-based Method for the Prediction of Functional Proteins of Gram-negative Bacteria Using Amino Acid Patterns and Composition , 2006, Genom. Proteom. Bioinform..

[23]  Jian Yang,et al.  VFDB 2016: hierarchical and refined dataset for big data analysis—10 years on , 2015, Nucleic Acids Res..

[24]  Mijuan Shi,et al.  FVD: The fish-associated virus database. , 2018, Infection, genetics and evolution : journal of molecular epidemiology and evolutionary genetics in infectious diseases.

[25]  Jun Yu,et al.  VFDB: a reference database for bacterial virulence factors , 2004, Nucleic Acids Res..