BIGSdb: Scalable analysis of bacterial genome variation at the population level

BackgroundThe opportunities for bacterial population genomics that are being realised by the application of parallel nucleotide sequencing require novel bioinformatics platforms. These must be capable of the storage, retrieval, and analysis of linked phenotypic and genotypic information in an accessible, scalable and computationally efficient manner.ResultsThe Bacterial Isolate Genome Sequence Database (BIGSDB) is a scalable, open source, web-accessible database system that meets these needs, enabling phenotype and sequence data, which can range from a single sequence read to whole genome data, to be efficiently linked for a limitless number of bacterial specimens. The system builds on the widely used mlstdbNet software, developed for the storage and distribution of multilocus sequence typing (MLST) data, and incorporates the capacity to define and identify any number of loci and genetic variants at those loci within the stored nucleotide sequences. These loci can be further organised into 'schemes' for isolate characterisation or for evolutionary or functional analyses. Isolates and loci can be indexed by multiple names and any number of alternative schemes can be accommodated, enabling cross-referencing of different studies and approaches. LIMS functionality of the software enables linkage to and organisation of laboratory samples. The data are easily linked to external databases and fine-grained authentication of access permits multiple users to participate in community annotation by setting up or contributing to different schemes within the database. Some of the applications of BIGSDB are illustrated with the genera Neisseria and Streptococcus.The BIGSDB source code and documentation are available at http://pubmlst.org/software/database/bigsdb/.ConclusionsGenomic data can be used to characterise bacterial isolates in many different ways but it can also be efficiently exploited for evolutionary or functional studies. BIGSDB represents a freely available resource that will assist the broader community in the elucidation of the structure and function of bacteria by means of a population genomics approach.

[1]  Paul Keim,et al.  Whole-Genome-Based Phylogeny and Divergence of the Genus Brucella , 2009, Journal of bacteriology.

[2]  Scott N Peterson,et al.  Whole genome single nucleotide polymorphism based phylogeny of Francisella tularensis and its application to the development of a strain typing assay , 2009, BMC Microbiology.

[3]  E. Birney,et al.  Velvet: algorithms for de novo short read assembly using de Bruijn graphs. , 2008, Genome research.

[4]  S. Sheppard,et al.  MLST clustering of Campylobacter jejuni isolates from patients with gastroenteritis, reactive arthritis and Guillain–Barré syndrome , 2010, Journal of applied microbiology.

[5]  W. Pearson Effective protein sequence comparison. , 1996, Methods in enzymology.

[6]  Keith A. Jolley,et al.  Genomic Evidence for the Evolution of Streptococcus equi: Host Restriction, Increased Virulence, and Genetic Exchange with Human Pathogens , 2009, PLoS pathogens.

[7]  Adam Baldwin,et al.  Bmc Microbiology Multilocus Sequence Typing of Cronobacter Sakazakii and Cronobacter Malonaticus Reveals Stable Clonal Structures with Clinical Significance Which Do Not Correlate with Biotypes , 2022 .

[8]  Naryttza N. Diaz,et al.  The Subsystems Approach to Genome Annotation and its Use in the Project to Annotate 1000 Genomes , 2005, Nucleic acids research.

[9]  Panagiotis Deloukas,et al.  High-Throughput Genotyping of Salmonella enterica Serovar Typhi Allowing Geographical Assignment of Haplotypes and Pathotypes within an Urban District of Jakarta, Indonesia , 2008, Journal of Clinical Microbiology.

[10]  Martin C. J. Maiden,et al.  Bioinformatics Applications Note Sequence Type Analysis and Recombinational Tests (start) , 2022 .

[11]  F. Blattner,et al.  Mauve: multiple alignment of conserved genomic sequence with rearrangements. , 2004, Genome research.

[12]  Keith A Jolley,et al.  Development of an unambiguous and discriminatory multilocus sequence typing scheme for the Streptococcus zooepidemicus group. , 2008, Microbiology.

[13]  I-Min A. Chen,et al.  The Genomes On Line Database (GOLD) in 2007: status of genomic and metagenomic projects and their associated metadata , 2007, Nucleic Acids Res..

[14]  Brian G. Spratt,et al.  Multilocus Sequence Typing of Streptococcus pyogenes and the Relationships between emm Type and Clone , 2001, Infection and Immunity.

[15]  Keith A. Jolley,et al.  First Insights into the Evolution of Streptococcus uberis: a Multilocus Sequence Typing Scheme That Enables Investigation of Its Population Biology , 2006, Applied and Environmental Microbiology.

[16]  David M. Aanensen,et al.  The multilocus sequence typing network: mlst.net , 2005, Nucleic Acids Res..

[17]  Keith A Jolley,et al.  Molecular typing of meningococci: recommendations for target choice and nomenclature. , 2007, FEMS microbiology reviews.

[18]  Philippe Glaser,et al.  Multilocus Sequence Typing System for Group B Streptococcus , 2003, Journal of Clinical Microbiology.

[19]  D. Falush,et al.  Inference of Bacterial Microevolution Using Multilocus Sequence Data , 2007, Genetics.

[20]  Edward J. Feil,et al.  Multi-Locus Sequence Typing of Bartonella henselae Isolates from Three Continents Reveals Hypervirulent and Feline-Associated Clones , 2007, PloS one.

[21]  L. Price,et al.  Erratum: Multiple-locus variable-number tandem repeat analysis reveals genetic relationships within Bacillus anthracis (Journal of Bacteriology (2000) 182:10 (2928-2936)) , 2000 .

[22]  A. Moya,et al.  Determination of the Core of a Minimal Bacterial Gene Set , 2004, Microbiology and Molecular Biology Reviews.

[23]  M. Maiden,et al.  Carriage of serogroup C meningococci 1 year after meningococcal C conjugate polysaccharide vaccination , 2002, The Lancet.

[24]  Martin C. J. Maiden,et al.  AgdbNet – antigen sequence database software for bacterial typing , 2006, BMC Bioinformatics.

[25]  P. Donnelly,et al.  Inference of population structure using multilocus genotype data. , 2000, Genetics.

[26]  Mark Achtman,et al.  Evolutionary History of Salmonella Typhi , 2006, Science.

[27]  G. Weinstock,et al.  High-throughput whole-genome sequencing to dissect the epidemiology of Acinetobacter baumannii isolates from a hospital outbreak. , 2010, The Journal of hospital infection.

[28]  Keith A. Jolley,et al.  Target Gene Sequencing To Characterize the Penicillin G Susceptibility of Neisseria meningitidis , 2007, Antimicrobial Agents and Chemotherapy.

[29]  Daniel Falush,et al.  Campylobacter genotyping to determine the source of human infection. , 2009, Clinical infectious diseases : an official publication of the Infectious Diseases Society of America.

[30]  Matthew Berriman,et al.  Artemis and ACT: viewing, annotating and comparing sequences stored in a relational database , 2008, Bioinform..

[31]  Yuriy Fofanov,et al.  Molecular complexity of successive bacterial epidemics deconvoluted by comparative pathogenomics , 2010, Proceedings of the National Academy of Sciences.

[32]  I. Longden,et al.  EMBOSS: the European Molecular Biology Open Software Suite. , 2000, Trends in genetics : TIG.

[33]  W. Hanage,et al.  eBURST: Inferring Patterns of Evolutionary Descent among Clusters of Related Bacterial Genotypes from Multilocus Sequence Typing Data , 2004, Journal of bacteriology.

[34]  Alexandre P. Francisco,et al.  Global optimal eBURST analysis of multilocus typing data using a graphic matroid approach , 2009, BMC Bioinformatics.

[35]  Matthew Berriman,et al.  ACT: the Artemis comparison tool , 2005, Bioinform..

[36]  B. Spratt,et al.  A multilocus sequence typing scheme for Streptococcus pneumoniae: identification of clones associated with serious invasive disease. , 1998, Microbiology.

[37]  Alison J. Cody,et al.  Extended Sequence Typing of Campylobacter spp., United Kingdom , 2008, Emerging infectious diseases.

[38]  Julian Parkhill,et al.  Evolution of MRSA During Hospital Transmission and Intercontinental Spread , 2010, Science.

[39]  Liang Chen,et al.  Epidemic community-associated methicillin-resistant Staphylococcus aureus: Recent clonal expansion and diversification , 2008, Proceedings of the National Academy of Sciences.

[40]  Daniel J. Wilson,et al.  Variation of the factor H-binding protein of Neisseria meningitidis , 2009, Microbiology.

[41]  Ken J Forbes,et al.  Campylobacter genotypes from food animals, environmental sources and clinical disease in Scotland 2005/6. , 2009, International journal of food microbiology.

[42]  M. Maiden Multilocus sequence typing of bacteria. , 2006, Annual review of microbiology.

[43]  Joakim Lundeberg,et al.  Generations of sequencing technologies. , 2009, Genomics.

[44]  Erin Beck,et al.  The comprehensive microbial resource , 2000, Nucleic Acids Res..

[45]  Matthew R. Pocock,et al.  The Bioperl toolkit: Perl modules for the life sciences. , 2002, Genome research.

[46]  M. Achtman,et al.  Multilocus sequence typing: a portable approach to the identification of clones within populations of pathogenic microorganisms. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[47]  Keith A. Jolley,et al.  Population structure of Streptococcus oralis , 2009, Microbiology.

[48]  Paul Keim,et al.  Phylogenetic understanding of clonal populations in an era of whole genome sequencing. , 2009, Infection, genetics and evolution : journal of molecular epidemiology and evolutionary genetics in infectious diseases.

[49]  A. Larsson,et al.  Open Access , 2019 .

[50]  Martin C. J. Maiden,et al.  mlstdbNet – distributed multi-locus sequence typing (MLST) databases , 2004, BMC Bioinformatics.

[51]  L. Price,et al.  Multiple-Locus Variable-Number Tandem Repeat Analysis Reveals Genetic Relationships within Bacillus anthracis , 2000, Journal of bacteriology.

[52]  J. Wain,et al.  High-throughput sequencing provides insights into genome variation and evolution in Salmonella Typhi , 2008, Nature Genetics.

[53]  I-Min A. Chen,et al.  The integrated microbial genomes system: an expanding comparative analysis resource , 2009, Nucleic Acids Res..

[54]  Christopher G. Dowson,et al.  Development of a Multilocus Sequence Typing Scheme for the Pig Pathogen Streptococcus suis: Identification of Virulent Clones and Potential Capsular Serotype Exchange , 2002, Journal of Clinical Microbiology.

[55]  Gregory E. Jordan,et al.  Assigning strains to bacterial species via the internet , 2009, BMC Biology.

[56]  Antony V. Cox,et al.  The Ensembl Web site: mechanics of a genome browser. , 2004, Genome research.