CloVR-Comparative: automated, cloud-enabled comparative microbial genome sequence analysis pipeline

BackgroundThe benefit of increasing genomic sequence data to the scientific community depends on easy-to-use, scalable bioinformatics support. CloVR-Comparative combines commonly used bioinformatics tools into an intuitive, automated, and cloud-enabled analysis pipeline for comparative microbial genomics.ResultsCloVR-Comparative runs on annotated complete or draft genome sequences that are uploaded by the user or selected via a taxonomic tree-based user interface and downloaded from NCBI. CloVR-Comparative runs reference-free multiple whole-genome alignments to determine unique, shared and core coding sequences (CDSs) and single nucleotide polymorphisms (SNPs). Output includes short summary reports and detailed text-based results files, graphical visualizations (phylogenetic trees, circular figures), and a database file linked to the Sybil comparative genome browser. Data up- and download, pipeline configuration and monitoring, and access to Sybil are managed through CloVR-Comparative web interface. CloVR-Comparative and Sybil are distributed as part of the CloVR virtual appliance, which runs on local computers or the Amazon EC2 cloud. Representative datasets (e.g. 40 draft and complete Escherichia coli genomes) are processed in <36 h on a local desktop or at a cost of <$20 on EC2.ConclusionsCloVR-Comparative allows anybody with Internet access to run comparative genomics projects, while eliminating the need for on-site computational resources and expertise.

[1]  Paramvir S. Dehal,et al.  FastTree 2 – Approximately Maximum-Likelihood Trees for Large Alignments , 2010, PloS one.

[2]  Michelle G. Giglio,et al.  CloVR-Microbe: Assembly, gene finding and functional annotation of raw sequence data from single microbial genome projects – standard operating procedure, version 1.0 , 2011 .

[3]  Jacques Ravel,et al.  Visualization of comparative genomic analyses by BLAST score ratio , 2005, BMC Bioinformatics.

[4]  James H. Bullard,et al.  Origins of the E. coli strain causing an outbreak of hemolytic-uremic syndrome in Germany. , 2011, The New England journal of medicine.

[5]  David R. Riley,et al.  CloVR: A virtual machine for automated and portable sequence analysis from the desktop using cloud computing , 2011, BMC Bioinformatics.

[6]  Steven Salzberg,et al.  Improving pan-genome annotation using whole genome multiple alignment , 2011, BMC Bioinformatics.

[7]  Jason W. Sahl,et al.  A Comparative Genomic Analysis of Diverse Clonal Types of Enterotoxigenic Escherichia coli Reveals Pathovar-Specific Conservation , 2010, Infection and Immunity.

[8]  Jonathan Crabtree,et al.  Circleator: flexible circular visualization of genome-associated data with BioPerl and SVG , 2014, Bioinform..

[9]  Steven J. M. Jones,et al.  Whole-genome sequencing and social-network analysis of a tuberculosis outbreak. , 2011, The New England journal of medicine.

[10]  E. Brzuszkiewicz,et al.  Genome sequence analyses of two isolates from the recent Escherichia coli outbreak in Germany reveal the emergence of a new pathotype: Entero-Aggregative-Haemorrhagic Escherichia coli (EAHEC) , 2011, Archives of Microbiology.

[11]  Alexander Goesmann,et al.  EDGAR 2.0: an enhanced software platform for comparative gene content analyses , 2016, Nucleic Acids Res..

[12]  Samuel V. Angiuoli,et al.  Resources and Costs for Microbial Sequence Analysis Evaluated Using Virtual Machines and Cloud Computing , 2011, PloS one.

[13]  Junhua Li,et al.  Open-source genomic analysis of Shiga-toxin-producing E. coli O104:H4. , 2011, The New England journal of medicine.

[14]  S. Winter,et al.  A breathtaking feat , 2011, Gut microbes.

[15]  Xavier Didelot,et al.  The application of genomics to tracing bacterial pathogen transmission. , 2015, Current opinion in microbiology.

[16]  W. Rabsch,et al.  Genome and Transcriptome Adaptation Accompanying Emergence of the Definitive Type 2 Host-Restricted Salmonella enterica Serovar Typhimurium Pathovar , 2013, mBio.

[17]  Inna Dubchak,et al.  MicrobesOnline: an integrated portal for comparative and functional genomics , 2009, Nucleic Acids Res..

[18]  Jonathan Crabtree,et al.  Using Sybil for interactive comparative genomics of microbes on the web , 2011, Bioinform..

[19]  S. Salzberg,et al.  Versatile and open software for comparing large genomes , 2004, Genome Biology.

[20]  T. Cebula,et al.  Comparative Genomics of 28 Salmonella enterica Isolates: Evidence for CRISPR-Mediated Adaptive Sublineage Evolution , 2011, Journal of bacteriology.

[21]  Peter Williams,et al.  IMG: the integrated microbial genomes database and comparative analysis system , 2011, Nucleic Acids Res..

[22]  Jean-François Gibrat,et al.  Synchronized navigation and comparative analyses across Ensembl complete bacterial genomes with INSYGHT , 2015, Bioinform..

[23]  Jonathan Crabtree,et al.  Ergatis: a web interface and scalable software system for bioinformatics workflows , 2010, Bioinform..

[24]  D. Ussery,et al.  CMG-Biotools, a Free Workbench for Basic Comparative Microbial Genomics , 2013, PloS one.

[25]  Steven Salzberg,et al.  Mugsy: fast multiple alignment of closely related whole genomes , 2010, Bioinform..

[26]  Hirokazu Chiba,et al.  MBGD update 2015: microbial genome database for flexible ortholog analysis utilizing a diverse set of genomic data , 2014, Nucleic Acids Res..

[27]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[28]  D. Rasko,et al.  Phylomark, a Tool To Identify Conserved Phylogenetic Markers from Whole-Genome Alignments , 2012, Applied and Environmental Microbiology.

[29]  Julian Parkhill,et al.  Whole-genome sequencing for analysis of an outbreak of meticillin-resistant Staphylococcus aureus: a descriptive study , 2013, The Lancet. Infectious Diseases.