Combining fractional polynomial model building with multiple imputation

Multivariable fractional polynomial (MFP) models are commonly used in medical research. The datasets in which MFP models are applied often contain covariates with missing values. To handle the missing values, we describe methods for combining multiple imputation with MFP modelling, considering in turn three issues: first, how to impute so that the imputation model does not favour certain fractional polynomial (FP) models over others; second, how to estimate the FP exponents in multiply imputed data; and third, how to choose between models of differing complexity. Two imputation methods are outlined for different settings. For model selection, methods based on Wald‐type statistics and weighted likelihood‐ratio tests are proposed and evaluated in simulation studies. The Wald‐based method is very slightly better at estimating FP exponents. Type I error rates are very similar for both methods, although slightly less well controlled than analysis of complete records; however, there is potential for substantial gains in power over the analysis of complete records. We illustrate the two methods in a dataset from five trauma registries for which a prognostic model has previously been published, contrasting the selected models with that obtained by analysing the complete records only. © 2015 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.

[1]  Yvonne Vergouwe,et al.  Development and validation of a prediction model with missing predictor data: a practical approach. , 2010, Journal of clinical epidemiology.

[2]  Patrick Royston,et al.  Multiple imputation for an incomplete covariate that is a ratio , 2013, Statistics in medicine.

[3]  Roderick J A Little,et al.  A Review of Hot Deck Imputation for Survey Non‐response , 2010, International statistical review = Revue internationale de statistique.

[4]  David E. Booth,et al.  Analysis of Incomplete Multivariate Data , 2000, Technometrics.

[5]  R. Little Missing-Data Adjustments in Large Surveys , 1988 .

[6]  J. Robins,et al.  Inference for imputation estimators , 2000 .

[7]  Patrick Royston,et al.  Multiple imputation using chained equations: Issues and guidance for practice , 2011, Statistics in medicine.

[8]  Patrick Royston,et al.  How should variable selection be performed with multiply imputed data? , 2008, Statistics in medicine.

[9]  Paul T. von Hippel,et al.  HOW TO IMPUTE INTERACTIONS, SQUARES, AND OTHER TRANSFORMED VARIABLES , 2009 .

[10]  Patrick Royston,et al.  The cost of dichotomising continuous variables , 2006, BMJ : British Medical Journal.

[11]  G. Deuschl,et al.  Mapping the EQ-5D index by UPDRS and PDQ-8 in patients with Parkinson’s disease , 2013, Health and Quality of Life Outcomes.

[12]  S. Charalambous,et al.  Low haemoglobin predicts early mortality among adults starting antiretroviral therapy in an HIV care programme in South Africa: a cohort study , 2010, BMC public health.

[13]  Michael G. Kenward,et al.  Multiple Imputation and its Application , 2013 .

[14]  P. Royston,et al.  A New Proposal for Multivariable Modelling of Time‐Varying Effects in Survival Data Based on Fractional Polynomial Time‐Transformation , 2007, Biometrical journal. Biometrische Zeitschrift.

[15]  Patrick Royston,et al.  Comparison between splines and fractional polynomials for multivariable model building with continuous covariates: a simulation study with continuous response , 2013, Statistics in medicine.

[16]  J. Hippisley-Cox,et al.  Derivation and validation of QRISK, a new cardiovascular disease risk score for the United Kingdom: prospective open cohort study , 2007, BMJ : British Medical Journal.

[17]  T. P. Morris,et al.  Practical Use of Multiple Imputation , 2014 .

[18]  P. Royston,et al.  Regression using fractional polynomials of continuous covariates: parsimonious parametric modelling. , 1994 .

[19]  E. Steyerberg,et al.  [Regression modeling strategies]. , 2011, Revista espanola de cardiologia.

[20]  B. Arnold,et al.  Conditionally specified distributions: an introduction , 2001 .

[21]  Patrick Royston,et al.  Tuning multiple imputation by predictive mean matching and local residual draws , 2014, BMC Medical Research Methodology.

[22]  Ian R White,et al.  Multiple imputation of missing covariates with non-linear effects and interactions: an evaluation of statistical methods , 2012, BMC Medical Research Methodology.

[23]  A. Gelman,et al.  ON THE STATIONARY DISTRIBUTION OF ITERATIVE IMPUTATIONS , 2010, 1012.2902.

[24]  James R Carpenter,et al.  Multiple imputation of covariates by fully conditional specification: Accommodating the substantive model , 2012, Statistical methods in medical research.

[25]  Lena Osterhagen,et al.  Multiple Imputation For Nonresponse In Surveys , 2016 .

[26]  B. Arnold,et al.  Conditionally Specified Distributions: An Introduction (with comments and a rejoinder by the authors) , 2001 .

[27]  M. Cohen,et al.  Reappraising the concept of massive transfusion in trauma , 2010, Critical care.

[28]  P. Royston,et al.  Fractional polynomial model selection procedures: investigation of type i error rate , 2001 .

[29]  Donald B. Rubin,et al.  Performing likelihood ratio tests with multiply-imputed data sets , 1992 .

[30]  K. Tilling,et al.  Comparison of imputation variance estimators , 2014, Statistical methods in medical research.

[31]  Per Capita,et al.  About the authors , 1995, Machine Vision and Applications.

[32]  Douglas G Altman,et al.  Dichotomizing continuous predictors in multiple regression: a bad idea , 2006, Statistics in medicine.

[33]  Gerko Vink,et al.  Multiple Imputation of Squared Terms , 2013 .

[34]  B. Lewis,et al.  Blood lipid concentrations and other cardiovascular risk factors: distribution, prevalence, and detection in Britain , 1988, British medical journal.

[35]  Michael G. Kenward,et al.  Multiple Imputation and its Application: Carpenter/Multiple Imputation and its Application , 2013 .

[36]  Patrick Royston,et al.  Multivariable Model-Building: A Pragmatic Approach to Regression Analysis based on Fractional Polynomials for Modelling Continuous Variables , 2008 .

[37]  Sunil J Rao,et al.  Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis , 2003 .

[38]  P. Royston,et al.  Is treatment with interferon-α effective in all patients with metastatic renal carcinoma? A new approach to the investigation of interactions , 2004, British Journal of Cancer.

[39]  D. Rubin Multiple Imputation After 18+ Years , 1996 .