The design of simulation studies in medical statistics

Simulation studies use computer intensive procedures to assess the performance of a variety of statistical methods in relation to a known truth. Such evaluation cannot be achieved with studies of real data alone. Designing high-quality simulations that reflect the complex situations seen in practice, such as in prognostic factors studies, is not a simple process. Unfortunately, very few published simulation studies provide sufficient details to allow readers to understand fully all the processes required to design a simulation study. When planning a simulation study, it is recommended that a detailed protocol be produced, giving full details of how the study will be performed, analysed and reported. This paper details the important considerations necessary when designing any simulation study, including defining specific objectives of the study, determining the procedures for generating the data sets and the number of simulations to perform. A checklist highlighting the important considerations when designing a simulation study is provided. A small review of the literature identifies the current practices within published simulation studies.

[1]  Maurice J. Burke,et al.  On Simulation and the Teaching of Statistics , 2000 .

[2]  Comparing regression methods for the two‐stage clonal expansion model of carcinogenesis , 2004, Statistics in medicine.

[3]  Hana Ševčíková Statistical Simulations on Parallel Computers , 2004 .

[4]  E. Skovlund,et al.  A simple approach to power and sample size calculations in logistic regression and Cox regression models , 2004, Statistics in medicine.

[5]  Bahadur Singh,et al.  Sample size determination for comparing several survival curves with unequal allocations , 2004, Statistics in medicine.

[6]  S. Keleş,et al.  Recurrent events analysis in the presence of time‐dependent covariates and dependent censoring , 2004 .

[7]  R. Little,et al.  Proportional hazards regression with missing covariates , 1999 .

[8]  Kristopher J Preacher,et al.  On the practice of dichotomization of quantitative variables. , 2002, Psychological methods.

[9]  D. DeMets,et al.  Increasing the sample size when the unblinded interim result is promising , 2004, Statistics in medicine.

[10]  J. Schafer,et al.  Missing data: our view of the state of the art. , 2002, Psychological methods.

[11]  John Whitehead,et al.  Bayesian decision procedures for dose-escalation based on evidence of undesirable events and therapeutic benefit. , 2006, Statistics in medicine.

[12]  D. Rubin,et al.  Statistical Analysis with Missing Data. , 1989 .

[13]  H. Stern,et al.  The use of multiple imputation for the analysis of missing data. , 2001, Psychological methods.

[14]  Pierre L’Ecuyer,et al.  Random Number Generation , 2008, Encyclopedia of Algorithms.

[15]  S. Looney,et al.  Asymptotic properties of a two sample randomized test for partially dependent data , 2006 .

[16]  L. Sheiner,et al.  Estimating treatment effect in the presence of non‐compliance measured with error: precision and robustness of data analysis methods , 2004, Statistics in medicine.

[17]  Julian P T Higgins,et al.  Controlling the risk of spurious findings from meta‐regression , 2004, Statistics in medicine.

[18]  J. Hammersley SIMULATION AND THE MONTE CARLO METHOD , 1982 .

[19]  Thomas J. Santner,et al.  The Design and Analysis of Computer Experiments , 2003, Springer Series in Statistics.

[20]  Brian D. Ripley,et al.  Stochastic Simulation , 2005 .

[21]  Peter C Austin,et al.  Inflation of the type I error rate when a continuous confounding variable is categorized in logistic regression analyses , 2004, Statistics in medicine.

[22]  F. Breitenecker,et al.  Elements of simulation , 1997 .

[23]  Ralf Bender,et al.  Generating survival times to simulate Cox proportional hazards models , 2005, Statistics in medicine.

[24]  B. Morgan Elements of Simulation , 1984 .

[25]  Jürgen Unützer,et al.  A comparison of imputation methods in a longitudinal randomized clinical trial , 2005, Statistics in medicine.

[26]  Ignacio Díaz-Emparanza Is a small Monte Carlo analysis a good analysis? , 2000 .

[27]  C. Lunneborg Data Analysis by Resampling: Concepts and Applications , 1999 .

[28]  M. Abrahamowicz,et al.  Evaluation of Cox's model and logistic regression for matched case‐control data with time‐dependent covariates: a simulation study , 2003, Statistics in medicine.

[29]  G. W. Snedecor Statistical Methods , 1964 .

[30]  Thomas R Belin,et al.  Imputation for incomplete high‐dimensional multivariate normal data using a common factor model , 2004, Statistics in medicine.

[31]  Søren Feodor Nielsen,et al.  1. Statistical Analysis with Missing Data (2nd edn). Roderick J. Little and Donald B. Rubin, John Wiley & Sons, New York, 2002. No. of pages: xv+381. ISBN: 0‐471‐18386‐5 , 2004 .

[32]  George Marsaglia,et al.  Random Number Generators , 2003 .

[33]  Kirk M. Wolter,et al.  The Bootstrap Method , 2007 .

[34]  David J. Spiegelhalter,et al.  Introducing Markov chain Monte Carlo , 1995 .

[35]  Pierre Côté,et al.  Loss to Follow-Up in Cohort Studies: How Much is Too Much? , 2003, European Journal of Epidemiology.

[36]  John M. Lachin Sample Size Determination , 2005 .

[37]  Norio Masuda,et al.  PRNGlib: A Parallel Random Number Generator Library , 1996 .

[38]  Roderick J. A. Little,et al.  Statistical Analysis with Missing Data: Little/Statistical Analysis with Missing Data , 2002 .

[39]  Neil Klar,et al.  Methods for modelling change in cluster randomization trials , 2004, Statistics in medicine.

[40]  Robert Tibshirani,et al.  An Introduction to the Bootstrap , 1994 .

[41]  Patrick Royston,et al.  A new measure of prognostic separation in survival data , 2004, Statistics in medicine.

[42]  Osman Balci Guidelines for successful simulation studies , 1990, 1990 Winter Simulation Conference Proceedings.

[43]  Christian P. Robert,et al.  Monte Carlo Statistical Methods , 2005, Springer Texts in Statistics.

[44]  J. Schafer,et al.  A comparison of inclusive and restrictive strategies in modern missing data procedures. , 2001, Psychological methods.

[45]  Michal Abrahamowicz,et al.  Marginal and hazard ratio specific random data generation: Applications to semi-parametric bootstrapping , 2002, Stat. Comput..

[46]  H. Demirtas JMASM16: Pseudo-Random Number Generation In R For Some Univariate Distributions , 2005 .