Data mining tools for Salmonella characterization: application to gel-based fingerprinting analysis

BackgroundPulsed field gel electrophoresis (PFGE) is currently the most widely and routinely used method by the Centers for Disease Control and Prevention (CDC) and state health labs in the United States for Salmonella surveillance and outbreak tracking. Major drawbacks of commercially available PFGE analysis programs have been their difficulty in dealing with large datasets and the limited availability of analysis tools. There exists a need to develop new analytical tools for PFGE data mining in order to make full use of valuable data in large surveillance databases.ResultsIn this study, a software package was developed consisting of five types of bioinformatics approaches exploring and implementing for the analysis and visualization of PFGE fingerprinting. The approaches include PFGE band standardization, Salmonella serotype prediction, hierarchical cluster analysis, distance matrix analysis and two-way hierarchical cluster analysis. PFGE band standardization makes it possible for cross-group large dataset analysis. The Salmonella serotype prediction approach allows users to predict serotypes of Salmonella isolates based on their PFGE patterns. The hierarchical cluster analysis approach could be used to clarify subtypes and phylogenetic relationships among groups of PFGE patterns. The distance matrix and two-way hierarchical cluster analysis tools allow users to directly visualize the similarities/dissimilarities of any two individual patterns and the inter- and intra-serotype relationships of two or more serotypes, and provide a summary of the overall relationships between user-selected serotypes as well as the distinguishable band markers of these serotypes. The functionalities of these tools were illustrated on PFGE fingerprinting data from PulseNet of CDC.ConclusionsThe bioinformatics approaches included in the software package developed in this study were integrated with the PFGE database to enhance the data mining of PFGE fingerprints. Fast and accurate prediction makes it possible to elucidate Salmonella serotype information before conventional serological methods are pursued. The development of bioinformatics tools to distinguish the PFGE markers and serotype specific patterns will enhance PFGE data retrieval, interpretation and serotype identification and will likely accelerate source tracking to identify the Salmonella isolates implicated in foodborne diseases.

[1]  C. Chang,et al.  Microarray analysis of antimicrobial resistance genes in Salmonella enterica from preharvest poultry environment , 2009, Journal of applied microbiology.

[2]  Wen Zou,et al.  Evaluation of Pulsed-Field Gel Electrophoresis Profiles for Identification of Salmonella Serotypes , 2010, Journal of Clinical Microbiology.

[3]  P. Jaccard,et al.  Etude comparative de la distribution florale dans une portion des Alpes et des Jura , 1901 .

[4]  Peter Gerner-Smidt,et al.  Interpretation of pulsed-field gel electrophoresis patterns in foodborne disease investigations and surveillance. , 2006, Foodborne pathogens and disease.

[5]  Peter Rabold,et al.  Multiple-locus variable-number tandem repeat analysis for subtyping of Salmonella enterica subsp. enterica serovar Enteritidis. , 2009, International journal of medical microbiology : IJMM.

[6]  Wen Zou,et al.  Prediction System for Rapid Identification of Salmonella Serotypes Based on Pulsed-Field Gel Electrophoresis Fingerprints , 2012, Journal of Clinical Microbiology.

[7]  Carole Feurer,et al.  Pulsed-field gel electrophoresis subtyping database for foodborne Salmonella enterica serotype discrimination. , 2007, Foodborne pathogens and disease.

[8]  B. Swaminathan,et al.  Standardization of pulsed-field gel electrophoresis protocols for the subtyping of Escherichia coli O157:H7, Salmonella, and Shigella for PulseNet. , 2006, Foodborne pathogens and disease.

[9]  Catherine M Logue,et al.  Molecular and comparative analysis of Salmonella enterica Senftenberg from humans and animals using PFGE, MLST and NARMS , 2011, BMC Microbiology.

[10]  Rajesh Nayak,et al.  Microarray analysis of virulence gene profiles in Salmonella serovars from food/food animal environment. , 2011, Journal of infection in developing countries.

[11]  Peter Gerner-Smidt,et al.  Foodborne disease trends and reports. , 2008, Foodborne pathogens and disease.

[12]  Elena Deza,et al.  Encyclopedia of Distances , 2014 .

[13]  Errol Strain,et al.  Identification of a salmonellosis outbreak by means of molecular sequencing. , 2011, The New England journal of medicine.

[14]  C. Elkins,et al.  Genomic paradigms for food-borne enteric pathogen analysis at the USFDA: case studies highlighting method utility, integration and resolution , 2013, Food additives & contaminants. Part A, Chemistry, analysis, control, exposure & risk assessment.

[15]  B. Swaminathan,et al.  PulseNet: the molecular subtyping network for foodborne bacterial disease surveillance, United States. , 2001, Emerging infectious diseases.

[16]  Pierre Wattiau,et al.  Methodologies for Salmonella enterica subsp. enterica Subtyping: Gold Standards and Alternatives , 2011, Applied and Environmental Microbiology.

[17]  Mark Achtman,et al.  Salmonella typhi, the causative agent of typhoid fever, is approximately 50,000 years old. , 2002, Infection, genetics and evolution : journal of molecular epidemiology and evolutionary genetics in infectious diseases.

[18]  Yang Ding,et al.  CodingMotif: exact determination of overrepresented nucleotide motifs in coding sequences , 2011, BMC Bioinformatics.

[19]  B. Swaminathan,et al.  PulseNet USA: a five-year update. , 2006, Foodborne pathogens and disease.

[20]  Errol Strain,et al.  High resolution clustering of Salmonella enterica serovar Montevideo strains using a next-generation sequencing approach , 2012, BMC Genomics.

[21]  Shaohua Zhao,et al.  Genome Sequences of Five Salmonella enterica Serovar Heidelberg Isolates Associated with a 2011 Multistate Outbreak in the United States , 2012, Journal of bacteriology.

[22]  Chien-Shun Chiou,et al.  Development and evaluation of multilocus variable number tandem repeat analysis for fine typing and phylogenetic analysis of Salmonella enterica serovar Typhimurium. , 2010, International journal of food microbiology.

[23]  John B. Luchansky,et al.  Use of Pulsed-Field Gel Electrophoresis To Characterize the Heterogeneity and Clonality of Salmonella Isolates Obtained from the Carcasses and Feces of Swine at Slaughter , 2003, Applied and Environmental Microbiology.

[24]  Shaohua Zhao,et al.  Draft Genome Sequences of Eight Salmonella enterica Serotype Newport Strains from Diverse Hosts and Locations , 2012, Journal of bacteriology.

[25]  M. Widdowson,et al.  Foodborne Illness Acquired in the United States—Major Pathogens , 2011, Emerging infectious diseases.

[26]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[27]  Wen Zou,et al.  Meta-Analysis of Pulsed-Field Gel Electrophoresis Fingerprints Based on a Constructed Salmonella Database , 2013, PloS one.

[28]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[29]  J. B. Turpin,et al.  Related antimicrobial resistance genes detected in different bacterial species co-isolated from swine fecal samples. , 2011, Foodborne pathogens and disease.

[30]  Ruth E. Timme,et al.  Draft Genome Sequences of 21 Salmonella enterica Serovar Enteritidis Strains , 2012, Journal of bacteriology.