Analyzing Incomplete Political Science Data: An Alternative Algorithm for Multiple Imputation

We propose a remedy for the discrepancy between the way political scientists analyze data with missing values and the recommendations of the statistics community. Methodologists and statisticians agree that “multiple imputation” is a superior approach to the problem of missing data scattered through one’s explanatory and dependent variables than the methods currently used in applied data analysis. The discrepancy occurs because the computational algorithms used to apply the best multiple imputation models have been slow, difficult to implement, impossible to run with existing commercial statistical packages, and have demanded considerable expertise. We adapt an algorithm and use it to implement a general-purpose, multiple imputation model for missing data. This algorithm is considerably faster and easier to use than the leading method recommended in the statistics literature. We also quantify the risks of current missing data practices, illustrate how to use the new procedure, and evaluate this alternative through simulated data as well as actual empirical examples. Finally, we offer easy-to-use software that implements all methods discussed.

[1]  H. Hartley Maximum Likelihood Estimation from Incomplete Data , 1958 .

[2]  M. Woodbury A missing information principle: theory and applications , 1972 .

[3]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[4]  J. Heckman The Common Structure of Statistical Models of Truncation, Sample Selection and Limited Dependent Variables and a Simple Estimator for Such Models , 1976 .

[5]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[6]  D. Rubin Formalizing Subjective Notions about the Effect of Nonrespondents in Sample Surveys , 1977 .

[7]  D. O. Sears,et al.  Prejudice and Politics: Symbolic Racism Versus Racial Threats to the Good Life , 1981 .

[8]  Alexander Basilevsky,et al.  Chapter 12 – Missing Data: A Review of the Literature , 1983 .

[9]  J. B. Mcconahay,et al.  Modern racism, ambivalence, and the Modern Racism Scale. , 1986 .

[10]  Christopher H. Achen The Statistical Analysis of Quasi-Experiments , 2023 .

[11]  D. Kinder The Continuing American Dilemma: White Resistance to Racial Change 40 Years After Myrdal , 1986 .

[12]  D. Rubin,et al.  Multiple Imputation for Interval Estimation from Simple Random Samples with Ignorable Nonresponse , 1986 .

[13]  D. Rubin,et al.  The calculation of posterior distributions by data augmentation , 1987 .

[14]  Donald B. Rubin,et al.  Comment : A noniterative sampling/importance resampling alternative to the data augmentation algorithm for creating a few imputations when fractions of missing information are modest : The SIR Algorithm , 1987 .

[15]  W. Wong,et al.  The calculation of posterior distributions by data augmentation , 1987 .

[16]  R. Okafor Maximum likelihood estimation from incomplete data , 1987 .

[17]  Kim-Hung Li,et al.  Imputation using Markov chains , 1988 .

[18]  Nathaniel Schenker,et al.  Asymptotic results for multiple imputation , 1988 .

[19]  Roderick J. A. Little,et al.  The Analysis of Social Science Data with Missing Values , 1989 .

[20]  Charles H. Franklin,et al.  Estimation across Data Sets: Two-Stage Auxiliary Instrumental Variables Estimation (2SAIV) , 1989, Political Analysis.

[21]  G. King,et al.  Unifying Political Methodology: The Likelihood Theory of Statistical Inference , 1989 .

[22]  Daniel F. Heitjan,et al.  Inference from Grouped Continuous Data: A Review , 1989 .

[23]  G. C. Wei,et al.  A Monte Carlo Implementation of the EM Algorithm and the Poor Man's Data Augmentation Algorithms , 1990 .

[24]  Gary King,et al.  A Unified Model of Cabinet Dissolution in Parliamentary Democracies , 1990 .

[25]  Adrian F. M. Smith,et al.  Sampling-Based Approaches to Calculating Marginal Densities , 1990 .

[26]  D. Relles,et al.  Theory Testing in a World of Constrained Research Design , 1990 .

[27]  Donald B. Rubin,et al.  Multiple Imputation of Industry and Occupation Codes in Census Public-use Samples Using Bayesian Logistic Regression , 1991 .

[28]  A. Goldberger A course in econometrics , 1991 .

[29]  D. Brownstone Multiple Imputations for LInear Regression Models , 1991 .

[30]  Andrew Skalaban Interstate Competition and State Strategies to Deregulate Interstate Banking 1982-1988 , 1992, The Journal of Politics.

[31]  Donald B. Rubin,et al.  Performing likelihood ratio tests with multiply-imputed data sets , 1992 .

[32]  D. Rubin,et al.  Inference from Iterative Simulation Using Multiple Sequences , 1992 .

[33]  Christopher Winship,et al.  Models for Sample Selection Bias , 1992 .

[34]  Roderick J. A. Little Regression with Missing X's: A Review , 1992 .

[35]  John O. Brehm The Phantom Respondents: Opinion Surveys and Political Representation , 1993 .

[36]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[37]  Jun S. Liu,et al.  Covariance structure of the Gibbs sampler with applications to the comparisons of estimators and augmentation schemes , 1994 .

[38]  Melvin J. Hinich,et al.  Ideology and the theory of political choice , 1994 .

[39]  Christopher Winship,et al.  Sampling Weights and Regression Analysis , 1994 .

[40]  Bradley P. Carlin,et al.  Markov Chain Monte Carlo conver-gence diagnostics: a comparative review , 1996 .

[41]  Xiao-Li Meng,et al.  Posterior Predictive $p$-Values , 1994 .

[42]  Xiao-Li Meng,et al.  Multiple-Imputation Inferences with Uncongenial Sources of Input , 1994 .

[43]  Roderick J. A. Little,et al.  A stimulation study to evaluate the performance of model-based multiple imputations in HCHS health examination surveys , 1995 .

[44]  M. Fish The Advent of Multipartism in Russia, 1993-95 , 1995 .

[45]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[46]  R. Michael Alvarez,et al.  American Ambivalence Towards Abortion Policy: Development of a Heteroskedastic Probit Model of Competing Values , 1995 .

[47]  Trivellore E. Raghunathan,et al.  A Split Questionnaire Survey Design , 1995 .

[48]  D. Rubin Multiple Imputation After 18+ Years , 1996 .

[49]  Peter Green,et al.  Markov chain Monte Carlo in Practice , 1996 .

[50]  Support for Democracy and Political Opposition in Russia, 1993-1995 , 1996 .

[51]  James A. McCann,et al.  Democratizing Mexico: Public Opinion and Electoral Choices , 1996 .

[52]  J. Rao On Variance Estimation with Imputed Survey Data , 1996 .

[53]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[54]  Larry M. Bartels Uninformed Votes: Information E ects in Presidential Elections , 1996 .

[55]  Donald B. Rubin On Variance Estimation With Imputed Survey Data: Rejoinder , 1996 .

[56]  J. Bailey Democratizing Mexico: Public Opinion and Electoral Choices.Jorge I. Dominguez , James A. McCann , 1997 .

[57]  John M. Brehm,et al.  Are Americans Ambivalent Towards Racial Policies , 1997 .

[58]  R. Rose,et al.  How Russia Votes , 1997 .

[59]  Robert Huckfeldt,et al.  Partisan Cues and the Media: Information Flows in the 1992 Presidential Election , 1998, American Political Science Review.

[60]  James M. Robins,et al.  Large-sample theory for parametric multiple imputation procedures , 1998 .

[61]  Richard J. Timpone Structure, Behavior, and Voter Turnout in the United States , 1998, American Political Science Review.

[62]  J L Schafer,et al.  Multiple Imputation for Multivariate Missing-Data Problems: A Data Analyst's Perspective. , 1998, Multivariate behavioral research.

[63]  Leader Popularity and Party Development in Post-Soviet Russia, , 1998 .

[64]  Bradley P. Carlin,et al.  Markov Chain Monte Carlo in Practice: A Roundtable Discussion , 1998 .

[65]  M. A. Tanner,et al.  Tools for Statistical Inference: Methods for the Exploration of Posterior Distributions and Likelihood Functions, 3rd Edition , 1998 .

[66]  A. Gelman,et al.  Not Asked and Not Answered: Multiple Imputation for Multiple Surveys , 1998 .

[67]  Mark Andrews,et al.  Bayesian data analysis , 1999 .

[68]  Gary King,et al.  A Statistical Model for Multiparty Electoral Data , 1999, American Political Science Review.

[69]  Gary King,et al.  AMELIA: A Program for Missing Data (software) , 1999 .

[70]  J. Schafer,et al.  On the performance of multiple imputation for multivariate data with small sample size , 1999 .

[71]  J. Robins,et al.  Inference for imputation estimators , 2000 .

[72]  David E. Booth,et al.  Analysis of Incomplete Multivariate Data , 2000, Technometrics.

[73]  T. Colton Transitional Citizens: Voters and What Influences Them in the New Russia , 2000 .

[74]  Simon Jackman,et al.  Estimation and Inference via Bayesian Simulation: An Introduction to Markov Chain Monte Carlo , 2000 .

[75]  R. Sherman TESTS OF CERTAIN TYPES OF IGNORABLE NONRESPONSE IN SURVEYS SUBJECT TO ITEM NONRESPONSE OR ATTRITION , 2000 .

[76]  Jason Wittenberg,et al.  Making the Most Of Statistical Analyses: Improving Interpretation and Presentation , 2000 .

[77]  P. Allison Multiple Imputation for Missing Data , 2000 .

[78]  Joshua A. Tucker,et al.  The emergence of mass partisanship in Russia, 1993-1996 , 2001 .

[79]  R. Fay When Are Inferences from Multiple Imputation Valid ? , 2002 .

[80]  Josep Perarnau,et al.  A SIMULATION STUDY TO EVALUATE THE PERFORMANCE OF ADAPTIVE CONTROL STRATEGIES FOR TRAFFIC SAFETY CONDITIONS IN THE VIELHA TUNNEL , 2003 .

[81]  Nicole A. Lazar,et al.  Statistical Analysis With Missing Data , 2003, Technometrics.

[82]  Jstor The American political science review , 2022 .