Simulation Framework for Realistic Large-scale Individual-level Health Data Generation

We propose a general framework for realistic data generation and simulation of complex systems in the health domain. The main use cases of the framework are predicting the development of risk factors and disease occurrence, evaluating the impact of interventions and policy decisions, and statistical method development. We present the fundamentals of the framework using rigorous mathematical definitions. The framework supports calibration to a real population as well as various manipulations and data collection processes. The freely available open-source implementation in R embraces efficient data structures, parallel computing and fast random number generation which ensure reproducibility and scalability. With the framework it is possible to run daily-level simulations for populations of millions individuals for decades of simulated time. An example on the occurrence of stroke, type 2 diabetes and mortality illustrates the usage of the framework in the Finnish context. In the example, we demonstrate the data-collection functionality by studying the impact of non-participation on the estimated risk models.

[1]  Wendy Wrapson,et al.  JAMSIM: a Microsimulation Modelling Policy Tool , 2012, J. Artif. Soc. Soc. Simul..

[2]  D. Cox,et al.  An Analysis of Transformations , 1964 .

[3]  M. Davidsen,et al.  Long-Term Survival and Causes of Death After Stroke , 2001, Stroke.

[4]  Tobias Fasth,et al.  A microsimulation model projecting the health care costs for resistance to antibacterial drugs in Sweden , 2018, European journal of public health.

[5]  M. Wolfson,et al.  POHEM--a framework for understanding and modelling the health of human populations. , 1994, World health statistics quarterly. Rapport trimestriel de statistiques sanitaires mondiales.

[6]  Martin Spielauer,et al.  Dynamic microsimulation of health care demand, health care finance and the economic impact of health behaviours: survey and review , 2007 .

[7]  Ruth Davies,et al.  The Development of a Simulation Model of the Treatment of Coronary Heart Disease , 2002, Health care management science.

[8]  Ross Richardson,et al.  JAS-mine: A new platform for microsimulation and agent-based modelling , 2016 .

[9]  Seppo Koskinen,et al.  Terveys, toimintakyky ja hyvinvointi Suomessa : FinTerveys 2017 -tutkimus , 2018 .

[10]  Alain Bélanger,et al.  Implementing Dynamics of Immigration Integration in Labor Force Participation Projection in EU28 , 2019, Population Research and Policy Review.

[11]  Niko Beerenwinkel,et al.  CMOST: an open-source framework for the microsimulation of colorectal cancer screening strategies , 2017, BMC Medical Informatics and Decision Making.

[12]  Jeffrey M Albert,et al.  Continuous‐time causal mediation analysis , 2019, Statistics in medicine.

[13]  Mark Atwood,et al.  Public health and economic impact of 13-valent pneumococcal conjugate vaccine in US adults aged ≥50 years. , 2012, Vaccine.

[14]  Navonil Mustafee,et al.  Applications of simulation within the healthcare context , 2010, J. Oper. Res. Soc..

[15]  G. Marsaglia,et al.  The Ziggurat Method for Generating Random Variables , 2000 .

[16]  Bozena Mielczarek,et al.  Review of modelling approaches for healthcare simulation , 2016 .

[17]  Steve Weston,et al.  Foreach Parallel Adaptor for the 'parallel' Package , 2015 .

[18]  Jaakko Tuomilehto,et al.  The diabetes risk score: a practical tool to predict type 2 diabetes risk. , 2003, Diabetes care.

[19]  Claes Wohlin,et al.  Guidelines for snowballing in systematic literature studies and a replication in software engineering , 2014, EASE '14.

[20]  Simon J. E. Taylor,et al.  Profiling Literature in Healthcare Simulation , 2010, Simul..

[21]  David O Meltzer,et al.  The cost-effectiveness of improving diabetes care in U.S. federally qualified community health centers. , 2007, Health services research.

[22]  Ndaona Chokani,et al.  Enhancing response preparedness to influenza epidemics: Agent-based study of 2050 influenza season in Switzerland , 2020, Simul. Model. Pract. Theory.

[23]  Stef van Buuren,et al.  Flexible Imputation of Missing Data , 2012 .

[24]  Cathal O'Donoghue,et al.  LIAM2: a New Open Source Development Tool for Discrete-Time Dynamic Microsimulation Models , 2014, J. Artif. Soc. Soc. Simul..

[25]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[26]  Mark Atwood,et al.  Estimating the cost-effectiveness of a sequential pneumococcal vaccination program for adults in Germany , 2018, PloS one.

[27]  Takuji Nishimura,et al.  Mersenne twister: a 623-dimensionally equidistributed uniform pseudo-random number generator , 1998, TOMC.

[28]  Mark A. Moraes,et al.  Parallel random numbers: As easy as 1, 2, 3 , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[29]  Erkki Vartiainen,et al.  Predicting Coronary Heart Disease and Stroke: The FINRISK Calculator. , 2016, Global heart.

[30]  A. Zbrozek,et al.  Model of Complications of NIDDM: I. Model construction and assumptions , 1997, Diabetes Care.

[31]  Karen Tu,et al.  Alzheimer’s and other dementias in Canada, 2011 to 2031: a microsimulation Population Health Modeling (POHEM) study of projected prevalence, health burden, health services, and caregiving use , 2016, Population Health Metrics.

[32]  Bernhard Kosar,et al.  simSALUD: Design and Implementation of an Open-source Wizard based Spatial Microsimulation Framework , 2016 .

[33]  Sajjad Ahmad,et al.  Limiting youth access to tobacco: comparing the long-term health impacts of increasing cigarette excise taxes and raising the legal smoking age to 21 in the United States. , 2007 .

[34]  O. Aalen,et al.  Can we believe the DAGs? A comment on the relationship between causal DAGs and mechanisms , 2014, Statistical methods in medical research.

[35]  Nicole A. Lazar,et al.  Statistical Analysis With Missing Data , 2003, Technometrics.

[36]  Andrew Booth,et al.  Simulation Modelling in Healthcare: An Umbrella Review of Systematic Literature Reviews , 2017, PharmacoEconomics.

[37]  Melissa E. O'Neill PCG : A Family of Simple Fast Space-Efficient Statistically Good Algorithms for Random Number Generation , 2014 .

[38]  Peter Tanuseputro,et al.  The Population Health Model (POHEM): an overview of rationale, methods and applications , 2015, Population Health Metrics.

[39]  Peter Dalgaard,et al.  R Development Core Team (2010): R: A language and environment for statistical computing , 2010 .

[40]  Risto Lehtonen,et al.  Systematic handling of missing data in complex study designs – experiences from the Health 2000 and 2011 Surveys , 2016 .

[41]  D. M. Hutton,et al.  The Art of Multiprocessor Programming , 2008 .

[42]  J. Karvanen Study Design in Causal Models , 2012, 1211.2958.

[43]  John A. Nelder,et al.  A Simplex Method for Function Minimization , 1965, Comput. J..

[44]  Ray J. Paul,et al.  Simulating economic factors in adjuvant breast cancer treatment , 2000, J. Oper. Res. Soc..

[45]  Nan Kong,et al.  Multiobjective Calibration of Disease Simulation Models Using Gaussian Processes , 2019, Medical decision making : an international journal of the Society for Medical Decision Making.

[46]  Anya Okhmatovskaia,et al.  Projections of preventable risks for cardiovascular disease in Canada to 2021: a microsimulation modelling approach. , 2014, CMAJ open.

[47]  Melanie N. Tomintz,et al.  simSALUD: A Web-based Spatial Microsimulation to Model the Health Status for Small Areas Using the Example of Smokers in Austria , 2014 .

[48]  Martin Eklund,et al.  A natural history model for planning prostate cancer testing: Calibration and validation using Swedish registry data , 2018, bioRxiv.

[49]  Sebastiano Vigna,et al.  Scrambled Linear Pseudorandom Number Generators , 2018, ACM Trans. Math. Softw..

[50]  C. Whittaker,et al.  Report 9: Impact of non-pharmaceutical interventions (NPIs) to reduce COVID19 mortality and healthcare demand , 2020 .

[51]  Gregory S Zaric,et al.  A microsimulation cost-utility analysis of alcohol screening and brief intervention to reduce heavy alcohol consumption in Canada. , 2016, Addiction.

[52]  James G. Xenakis,et al.  Budgetary impact of treating acute bipolar mania in hospitalized patients with quetiapine: an economic analysis of clinical trials* , 2006, Current medical research and opinion.

[53]  Ørnulf Borgan,et al.  Dynamic path analysis – a useful tool to investigate mediation processes in clinical survival trials , 2015, Statistics in medicine.

[54]  Erwin Laure,et al.  A parallel microsimulation package for modelling cancer screening policies , 2016, 2016 IEEE 12th International Conference on e-Science (e-Science).

[55]  Martin Spielauer,et al.  A Portable Dynamic Microsimulation Model for Population, Education and Health Applications in Developing Countries , 2018 .