A population data-driven workflow for COVID-19 modeling and learning

CityCOVID is a detailed agent-based model that represents the behaviors and social interactions of 2.7 million residents of Chicago as they move between and colocate in 1.2 million distinct places, including households, schools, workplaces, and hospitals, as determined by individual hourly activity schedules and dynamic behaviors such as isolating because of symptom onset. Disease progression dynamics incorporated within each agent track transitions between possible COVID-19 disease states, based on heterogeneous agent attributes, exposure through colocation, and effects of protective behaviors of individuals on viral transmissibility. Throughout the COVID-19 epidemic, CityCOVID model outputs have been provided to city, county, and state stakeholders in response to evolving decision-making priorities, while incorporating emerging information on SARS-CoV-2 epidemiology. Here we demonstrate our efforts in integrating our high-performance epidemiological simulation model with large-scale machine learning to develop a generalizable, flexible, and performant analytical platform for planning and crisis response.

[1]  Jukka-Pekka Onnela,et al.  ABCpy: A User-Friendly, Extensible, and Parallel Library for Approximate Bayesian Computation , 2017, PASC.

[2]  F. Morone,et al.  Inference and control of the nosocomial transmission of methicillin-resistant Staphylococcus aureus , 2018, eLife.

[3]  David Ginsbourger,et al.  Quantifying uncertainty on Pareto fronts with Gaussian process conditional simulations , 2015, Eur. J. Oper. Res..

[4]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[5]  Gennaro Cordasco,et al.  Communication Strategies in Distributed Agent-Based Simulations: The Experience with D-Mason , 2013, Euro-Par Workshops.

[6]  Ewing Lusk,et al.  More scalability, less pain : A simple programming model and its implementation for extreme computing. , 2010 .

[7]  Charles M. Macal,et al.  Large-Scale Agent-Based Modeling with Repast HPC: A Case Study in Parallelizing an Agent-Based Model , 2015, Euro-Par Workshops.

[8]  Sophia Lefantzi,et al.  DAKOTA : a multilevel parallel object-oriented framework for design optimization, parameter estimation, uncertainty quantification, and sensitivity analysis. , 2011 .

[9]  Peer-Timo Bremer,et al.  A massively parallel infrastructure for adaptive multiscale simulations: modeling RAS initiation pathway for cancer , 2019, SC.

[10]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[11]  Douglas Thain,et al.  Distributed computing in practice: the Condor experience , 2005, Concurr. Pract. Exp..

[12]  Eric A. Applegate,et al.  An Introduction to Multi-Objective Simulation Optimization , 2018 .

[13]  Franck Jabot,et al.  EasyABC: performing efficient approximate Bayesian computation sampling schemes using R , 2013 .

[14]  Aki Vehtari,et al.  ELFI: Engine for Likelihood Free Inference , 2016, J. Mach. Learn. Res..

[15]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[16]  Emilio Luque,et al.  Care HPS: A high performance simulation tool for parallel and distributed agent-based modeling , 2017, Future Gener. Comput. Syst..

[17]  Xavier Rubio-Campillo,et al.  Pandora: A versatile agent-based modelling platform for social simulation , 2014 .

[18]  Daniel R. Jiang,et al.  BoTorch: A Framework for Efficient Monte-Carlo Bayesian Optimization , 2020, NeurIPS.

[19]  David Ginsbourger,et al.  Fast Computation of the Multi-Points Expected Improvement with Applications in Batch Selection , 2013, LION.

[20]  Mike Ludkovski,et al.  Replication or Exploration? Sequential Design for Stochastic Simulation Experiments , 2017, Technometrics.

[21]  Max Kuhn,et al.  Building Predictive Models in R Using the caret Package , 2008 .

[22]  Andreas Huth,et al.  Statistical inference for stochastic simulation models--theory and application. , 2011, Ecology letters.

[23]  Li Tang,et al.  MPI jobs within MPI jobs: A practical way of enabling task-level fault-tolerance in HPC workflows , 2019, Future Gener. Comput. Syst..

[24]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[25]  Carmine Spagnuolo,et al.  From desktop to Large-Scale Model Exploration with Swift/T , 2016, 2016 Winter Simulation Conference (WSC).

[26]  Nando de Freitas,et al.  Taking the Human Out of the Loop: A Review of Bayesian Optimization , 2016, Proceedings of the IEEE.

[27]  Zi Wang,et al.  Batched Large-scale Bayesian Optimization in High-dimensional Spaces , 2017, AISTATS.

[28]  Ian T. Foster,et al.  Dataflow coordination of data-parallel tasks via MPI 3.0 , 2013, EuroMPI.

[29]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[30]  R. A. Richardson,et al.  VECMAtk: a scalable verification, validation and uncertainty quantification toolkit for scientific simulations , 2021, Philosophical Transactions of the Royal Society A.

[31]  Michael S. Eldred,et al.  DAKOTA , A Multilevel Parallel Object-Oriented Framework for Design Optimization , Parameter Estimation , Uncertainty Quantification , and Sensitivity Analysis Version 4 . 0 User ’ s Manual , 2006 .

[32]  Philip C Cooley,et al.  Attribute Assignment to a Synthetic Population in Support of Agent-Based Disease Modeling. , 2010, Methods report.

[33]  Shinichi Morishita,et al.  On Classification and Regression , 1998, Discovery Science.

[34]  Richard Wilkinson,et al.  Accelerating ABC methods using Gaussian processes , 2014, AISTATS.

[35]  Daniel S. Katz,et al.  Swift/T: Large-Scale Application Composition via Distributed-Memory Dataflow Processing , 2013, 2013 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing.

[36]  Neil J. Gordon,et al.  A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking , 2002, IEEE Trans. Signal Process..

[37]  G. Evensen Data Assimilation: The Ensemble Kalman Filter , 2006 .

[38]  Robert B. Gramacy,et al.  Practical Heteroscedastic Gaussian Process Modeling for Large Simulation Experiments , 2016, Journal of Computational and Graphical Statistics.

[39]  Justin M. Wozniak,et al.  Extreme-Scale Dynamic Exploration of a Distributed Agent-Based Model With the EMEWS Framework , 2018, IEEE Transactions on Computational Social Systems.

[40]  Gregor von Laszewski,et al.  Swift: Fast, Reliable, Loosely Coupled Parallel Computation , 2007, 2007 IEEE Congress on Services (Services 2007).

[41]  Yves Deville,et al.  DiceKriging, DiceOptim: Two R Packages for the Analysis of Computer Experiments by Kriging-Based Metamodeling and Optimization , 2012 .

[42]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[43]  Marc Parizeau,et al.  DEAP: evolutionary algorithms made easy , 2012, J. Mach. Learn. Res..

[44]  Michael J. North,et al.  Complex adaptive systems modeling with Repast Simphony , 2013, Complex Adapt. Syst. Model..

[45]  Victor Picheny,et al.  GPareto: An R Package for Gaussian-Process-Based Multi-Objective Optimization and Analysis , 2019, Journal of Statistical Software.

[46]  Recent Advances in Optimization and Modeling of Contemporary Problems , 2018 .

[47]  Michael J. North,et al.  Parallel agent-based simulation with Repast for High Performance Computing , 2013, Simul..

[48]  Mark A. Beaumont,et al.  Approximate Bayesian Computation , 2019, Annual Review of Statistics and Its Application.

[49]  Daniel S. Katz,et al.  Pegasus: A framework for mapping complex scientific workflows onto distributed systems , 2005, Sci. Program..

[50]  Max D. Morris,et al.  Factorial sampling plans for preliminary computational experiments , 1991 .

[51]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[52]  D. Abramson,et al.  An Automatic Design Optimization Tool and its Application to Computational Fluid Dynamics , 2001, ACM/IEEE SC 2001 Conference (SC'01).

[53]  P. Frazier Bayesian Optimization , 2018, Hyperparameter Optimization in Machine Learning.

[54]  Max Welling,et al.  GPS-ABC: Gaussian Process Surrogate Approximate Bayesian Computation , 2014, UAI.

[55]  Guillaume Deffuant,et al.  Adaptive approximate Bayesian computation for complex models , 2011, Computational Statistics.

[56]  Bernd Bischl,et al.  mlrMBO: A Modular Framework for Model-Based Optimization of Expensive Black-Box Functions , 2017, 1703.03373.

[57]  Arnaud Doucet,et al.  An adaptive sequential Monte Carlo method for approximate Bayesian computation , 2011, Statistics and Computing.

[58]  Nicholson T. Collier,et al.  Experiences in Developing a Distributed Agent-based Modeling Toolkit with Python , 2020, 2020 IEEE/ACM 9th Workshop on Python for High-Performance and Scientific Computing (PyHPC).

[59]  Jonathan Ozik,et al.  MICROSIMULATION MODEL CALIBRATION USING INCREMENTAL MIXTURE APPROXIMATE BAYESIAN COMPUTATION. , 2018, The annals of applied statistics.

[60]  Justin M. Wozniak,et al.  Many Resident Task Computing in Support of Dynamic Ensemble Computations , 2015 .

[61]  David Ginsbourger,et al.  Expected Improvements for the Asynchronous Parallel Global Optimization of Expensive Functions: Potentials and Challenges , 2012, LION.

[62]  Robert B. Gramacy,et al.  laGP: Large-Scale Spatial Modeling via Local Approximate Gaussian Processes in R , 2016 .

[63]  David Cohn,et al.  Active Learning , 2010, Encyclopedia of Machine Learning.

[64]  Charles M. Macal,et al.  CHISIM: AN AGENT-BASED SIMULATION MODEL OF SOCIAL INTERACTIONS IN A LARGE URBAN AREA , 2018, 2018 Winter Simulation Conference (WSC).

[65]  N. Gordon,et al.  Novel approach to nonlinear/non-Gaussian Bayesian state estimation , 1993 .

[66]  Sriram Krishnamoorthy,et al.  Scioto: A Framework for Global-View Task Parallelism , 2008, 2008 37th International Conference on Parallel Processing.

[67]  Maximilian Balandat,et al.  Differentiable Expected Hypervolume Improvement for Parallel Multi-Objective Bayesian Optimization , 2020, NeurIPS.

[68]  Sean Luke,et al.  MASON: A Multiagent Simulation Environment , 2005, Simul..