Towards an integrated food safety surveillance system: a simulation study to explore the potential of combining genomic and epidemiological metadata

Foodborne infection is a result of exposure to complex, dynamic food systems. The efficiency of foodborne infection is driven by ongoing shifts in genetic machinery. Next-generation sequencing technologies can provide high-fidelity data about the genetics of a pathogen. However, food safety surveillance systems do not currently provide similar high-fidelity epidemiological metadata to associate with genetic data. As a consequence, it is rarely possible to transform genetic data into actionable knowledge that can be used to genuinely inform risk assessment or prevent outbreaks. Big data approaches are touted as a revolution in decision support, and pose a potentially attractive method for closing the gap between the fidelity of genetic and epidemiological metadata for food safety surveillance. We therefore developed a simple food chain model to investigate the potential benefits of combining ‘big’ data sources, including both genetic and high-fidelity epidemiological metadata. Our results suggest that, as for any surveillance system, the collected data must be relevant and characterize the important dynamics of a system if we are to properly understand risk: this suggests the need to carefully consider data curation, rather than the more ambitious claims of big data proponents that unstructured and unrelated data sources can be combined to generate consistent insight. Of interest is that the biggest influencers of foodborne infection risk were contamination load and processing temperature, not genotype. This suggests that understanding food chain dynamics would probably more effectively generate insight into foodborne risk than prescribing the hazard in ever more detail in terms of genotype.

[1]  G. Weinstock,et al.  High-throughput whole-genome sequencing to dissect the epidemiology of Acinetobacter baumannii isolates from a hospital outbreak. , 2010, The Journal of hospital infection.

[2]  Julian Parkhill,et al.  Rapid whole-genome sequencing for investigation of a neonatal MRSA outbreak. , 2012, The New England journal of medicine.

[3]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[4]  J Elith,et al.  A working guide to boosted regression trees. , 2008, The Journal of animal ecology.

[5]  Tine Hald,et al.  A Bayesian Approach to Quantify the Contribution of Animal‐Food Sources to Human Salmonellosis , 2004, Risk analysis : an official publication of the Society for Risk Analysis.

[6]  Ross Sparks,et al.  Optimal exponentially weighted moving average (EWMA) plans for detecting seasonal epidemics when faced with non-homogeneous negative binomial counts , 2011 .

[7]  T. Hald,et al.  A Quantitative Microbiological Risk Assessment for Salmonella transmission in pigs in individual EU Member States , 2011 .

[8]  P. Gale,et al.  Applications of omics approaches to the development of microbiological risk assessment using RNA virus dose–response models as a case study , 2014, Journal of applied microbiology.

[9]  Jing Cao,et al.  Modeling and Implementation of Cattle/Beef Supply Chain Traceability Using a Distributed RFID-Based Framework in China , 2015, PloS one.

[10]  T. Hald,et al.  Application of Molecular Typing Results in Source Attribution Models: The Case of Multiple Locus Variable Number Tandem Repeat Analysis (MLVA) of Salmonella Isolates Obtained from Integrated Surveillance in Denmark , 2016, Risk analysis : an official publication of the Society for Risk Analysis.

[11]  Guoling Lao,et al.  A Circulation Management Model for Safer Food Supply Based on RFID , 2008, 2008 4th International Conference on Wireless Communications, Networking and Mobile Computing.

[12]  Elisabeth Paté-Cornell,et al.  Fusion of Intelligence Information: A Bayesian Approach , 2002, Risk analysis : an official publication of the Society for Risk Analysis.

[13]  P. Ashton,et al.  Whole Genome Sequencing for the Retrospective Investigation of an Outbreak of Salmonella Typhimurium DT 8 , 2015, PLoS currents.

[14]  Marc Lipsitch,et al.  Epidemiologic data and pathogen genome sequences: a powerful synergy for public health , 2014, Genome Biology.

[15]  C. Haas Conditional Dose‐Response Relationships for Microorganisms: Development and Application , 2002, Risk analysis : an official publication of the Society for Risk Analysis.

[16]  B. Traynor The Era of Genomic Epidemiology , 2009, Neuroepidemiology.

[17]  Taghi M. Khoshgoftaar,et al.  RUSBoost: A Hybrid Approach to Alleviating Class Imbalance , 2010, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[18]  Alex Libby Taking It Further , 2018 .

[19]  T. Dallman,et al.  A multi-country Salmonella Enteritidis phage type 14b outbreak associated with eggs from a German producer: 'near real-time' application of whole genome sequencing and food chain investigations, United Kingdom, May to September 2014. , 2015, Euro surveillance : bulletin Europeen sur les maladies transmissibles = European communicable disease bulletin.

[20]  William Stafford Noble,et al.  Machine learning applications in genetics and genomics , 2015, Nature Reviews Genetics.

[21]  Errol Strain,et al.  Identification of a salmonellosis outbreak by means of molecular sequencing. , 2011, The New England journal of medicine.

[22]  Randall J. Olsen,et al.  Absence of Patient-to-Patient Intrahospital Transmission of Staphylococcus aureus as Determined by Whole-Genome Sequencing , 2014, mBio.

[23]  D Raoult,et al.  Whole genome sequencing as a tool to investigate a cluster of seven cases of listeriosis in Austria and Germany, 2011–2013 , 2014, Clinical microbiology and infection : the official publication of the European Society of Clinical Microbiology and Infectious Diseases.

[24]  Alois Knoll,et al.  Gradient boosting machines, a tutorial , 2013, Front. Neurorobot..

[25]  Eric S. Lander,et al.  Genomic epidemiology of the Escherichia coli O104:H4 outbreaks in Europe, 2011 , 2012, Proceedings of the National Academy of Sciences.

[26]  M. Struelens Consensus guidelines for appropriate use and evaluation of microbial epidemiologic typing systems. , 1996, Clinical microbiology and infection : the official publication of the European Society of Clinical Microbiology and Infectious Diseases.

[27]  A. Swart,et al.  Modeling of Salmonella Contamination in the Pig Slaughterhouse , 2016, Risk analysis : an official publication of the Society for Risk Analysis.

[28]  T. Hald,et al.  Quantitative Microbiological Risk Assessment and Source Attribution for Salmonella: Taking it Further , 2016, Risk analysis : an official publication of the Society for Risk Analysis.

[29]  F. Tenover,et al.  Plasmid fingerprinting. A tool for bacterial strain identification and surveillance of nosocomial and community-acquired infections. , 1985, Clinics in laboratory medicine.

[31]  T. Hald,et al.  A Quantitative Microbiological Risk Assessment for Salmonella in Pigs for the European Union , 2016, Risk analysis : an official publication of the Society for Risk Analysis.

[32]  Vahid Mirzabeiki,et al.  Effects on logistic operations from RFID- and EPCIS-enabled traceability , 2014 .

[33]  K. Ikemura Development and application , 1971 .

[34]  B. Swaminathan,et al.  PulseNet: the molecular subtyping network for foodborne bacterial disease surveillance, United States. , 2001, Emerging infectious diseases.