Joint modelling rationale for chained equations

BackgroundChained equations imputation is widely used in medical research. It uses a set of conditional models, so is more flexible than joint modelling imputation for the imputation of different types of variables (e.g. binary, ordinal or unordered categorical). However, chained equations imputation does not correspond to drawing from a joint distribution when the conditional models are incompatible. Concurrently with our work, other authors have shown the equivalence of the two imputation methods in finite samples.MethodsTaking a different approach, we prove, in finite samples, sufficient conditions for chained equations and joint modelling to yield imputations from the same predictive distribution. Further, we apply this proof in four specific cases and conduct a simulation study which explores the consequences when the conditional models are compatible but the conditions otherwise are not satisfied.ResultsWe provide an additional “non-informative margins” condition which, together with compatibility, is sufficient. We show that the non-informative margins condition is not satisfied, despite compatible conditional models, in a situation as simple as two continuous variables and one binary variable. Our simulation study demonstrates that as a consequence of this violation order effects can occur; that is, systematic differences depending upon the ordering of the variables in the chained equations algorithm. However, the order effects appear to be small, especially when associations between variables are weak.ConclusionsSince chained equations is typically used in medical research for datasets with different types of variables, researchers must be aware that order effects are likely to be ubiquitous, but our results suggest they may be small enough to be negligible.

[1]  Lena Osterhagen,et al.  Multiple Imputation For Nonresponse In Surveys , 2016 .

[2]  Rachael Hughes,et al.  Long-term immunologic response to antiretroviral therapy in low-income countries: a collaborative analysis of prospective studies , 2008, AIDS.

[3]  Stef van Buuren,et al.  Multiple imputation of discrete and continuous data by fully conditional specification , 2007 .

[4]  Constantine Frangakis,et al.  Multiple imputation by chained equations: what is it and how does it work? , 2011, International journal of methods in psychiatric research.

[5]  S. van Buuren Multiple imputation of discrete and continuous data by fully conditional specification , 2007, Statistical methods in medical research.

[6]  H. Y. Chen Compatibility of conditionally specified models. , 2010, Statistics & probability letters.

[7]  B. Efron The Efficiency of Logistic Regression Compared to Normal Discriminant Analysis , 1975 .

[8]  Jessika Weiss,et al.  Graphical Models In Applied Multivariate Statistics , 2016 .

[9]  Glyn Lewis,et al.  Physical activity and emotional problems amongst adolescents , 2008, Social Psychiatry and Psychiatric Epidemiology.

[10]  Stef van Buuren,et al.  MICE: Multivariate Imputation by Chained Equations in R , 2011 .

[11]  Reiner Hanewinkel,et al.  School-based alcohol education: results of a cluster-randomized controlled trial. , 2009, Addiction.

[12]  Oliver Rivero-Arias,et al.  Evaluation of software for multiple imputation of semi-continuous data , 2007, Statistical methods in medical research.

[13]  M. Tan,et al.  A Unified Method for Checking Compatibility and Uniqueness for Finite Discrete Conditional Distributions , 2008 .

[14]  Michael G Kenward,et al.  Multiple imputation: current perspectives , 2007, Statistical methods in medical research.

[15]  B. Arnold,et al.  Conditionally Specified Distributions: An Introduction (with comments and a rejoinder by the authors) , 2001 .

[16]  John Van Hoewyk,et al.  A multivariate technique for multiply imputing missing values using a sequence of regression models , 2001 .

[17]  A. Zwinderman,et al.  Multiple Imputation of Missing Genotype Data for Unrelated Individuals , 2006, Annals of human genetics.

[18]  Enrique Castillo,et al.  Conditionally specified distributions: An introduction - Comments and rejoinders , 2001 .

[19]  Nicole A. Lazar,et al.  Statistical Analysis With Missing Data , 2003, Technometrics.

[20]  Roderick J. A. Little,et al.  Statistical Analysis with Missing Data: Little/Statistical Analysis with Missing Data , 2002 .

[21]  Peter Cummings,et al.  Injuries of the Head, Face, and Neck in Relation to Ski Helmet Use , 2008, Epidemiology.

[22]  B. Arnold,et al.  Compatibility of partial or complete conditional probability specifications , 2004 .

[23]  W. Wong,et al.  The calculation of posterior distributions by data augmentation , 1987 .

[24]  David Maxwell Chickering,et al.  Dependency Networks for Inference, Collaborative Filtering, and Data Visualization , 2000, J. Mach. Learn. Res..

[25]  A. Gelman,et al.  ON THE STATIONARY DISTRIBUTION OF ITERATIVE IMPUTATIONS , 2010, 1012.2902.

[26]  B. Arnold,et al.  Conditionally specified distributions: an introduction , 2001 .

[27]  Kun-Lin Kuo,et al.  A simple algorithm for checking compatibility among discrete conditional distributions , 2011, Comput. Stat. Data Anal..

[28]  H. Boshuizen,et al.  Multiple imputation of missing blood pressure covariates in survival analysis. , 1999, Statistics in medicine.

[29]  David R. Cox The analysis of binary data , 1970 .

[30]  D. Rubin,et al.  Fully conditional specification in multivariate imputation , 2006 .

[31]  D. Edwards,et al.  Collapsibility and response variables in contingency tables , 1983 .

[32]  B. Arnold,et al.  Compatible Conditional Distributions , 1989 .

[33]  Jim Albert,et al.  Bayesian Computation with R , 2008 .

[34]  S. van Buuren,et al.  Multivariate Imputation by Chained Equations : Mice V1.0 User's manual , 2000 .

[35]  Ingram Olkin,et al.  Multivariate Correlation Models with Mixed Discrete and Continuous Variables , 1961 .

[36]  Ken P Kleinman,et al.  Much Ado About Nothing , 2007, The American statistician.

[37]  David E. Booth,et al.  Analysis of Incomplete Multivariate Data , 2000, Technometrics.

[38]  Yuchung J. Wang,et al.  Canonical representation of conditionally specified multivariate discrete distributions , 2009, J. Multivar. Anal..

[39]  Patrick Royston,et al.  Multiple imputation using chained equations: Issues and guidance for practice , 2011, Statistics in medicine.

[40]  C. Adebamowo,et al.  Parity and breastfeeding are protective against breast cancer in Nigerian women , 2008, British Journal of Cancer.

[41]  J. Sterne,et al.  Essential Medical Statistics , 2003 .