Correcting for Survey Misreports Using Auxiliary Information with an Application to Estimating Turnout

Misreporting is a problem that plagues researchers that use survey data. In this paper, we give conditions under which misreporting will lead to incorrect inferences. We then develop a model that corrects for misreporting using some auxiliary information, usually from an earlier or pilot validation study. This correction is implemented via Markov Chain Monte Carlo (MCMC) methods, which allows us to correct for other problems in surveys, such as non-response. This correction will allow researchers to continue to use the non-validated data to make inferences. The model, while fully general, is developed in the context of estimating models of turnout from the American National Elections Studies (ANES) data.

[1]  John K Kruschke,et al.  Bayesian data analysis. , 2010, Wiley interdisciplinary reviews. Cognitive science.

[2]  Joseph Hilbe,et al.  Data Analysis Using Regression and Multilevel/Hierarchical Models , 2009 .

[3]  Joseph G Ibrahim,et al.  Theory and Inference for Regression Models with Missing Responses and Covariates. , 2008, Journal of multivariate analysis.

[4]  Yingyao Hu,et al.  Identification and estimation of nonlinear models with misclassification error using instrumental variables: A general solution , 2008 .

[5]  J. Seaman,et al.  Binary Regression with Misclassified Response and Covariate Subject to Measurement Error: a Bayesian Approach , 2008, Biometrical journal. Biometrische Zeitschrift.

[6]  Andrew S. Fullerton,et al.  Bringing Registration into Models of Vote Overreporting , 2007 .

[7]  Michael P. McDonald The True Electorate A Cross-Validation of Voter Registration Files and Election Survey Demographics , 2007 .

[8]  D. Fiebig,et al.  MISCLASSIFICATION OF THE DEPENDENT VARIABLE IN BINARY CHOICE MODELS , 2007 .

[9]  J. Ibrahim,et al.  Semiparametric Models for Missing Covariate and Response Data in Regression Models , 2006, Biometrics.

[10]  Zhong Zhao Sensitivity of Propensity Score Methods to the Specifications , 2005, SSRN Electronic Journal.

[11]  Yuyan Duan,et al.  A Modified Bayesian Power Prior Approach with Applications in Water Quality Evaluation , 2005 .

[12]  G. Ridder,et al.  Estimation of Nonlinear Models with Mismeasured Regressors Using Marginal Information , 2005 .

[13]  Joseph G Ibrahim,et al.  Bayesian Analysis for Generalized Linear Models with Nonignorably Missing Covariates , 2005, Biometrics.

[14]  B. Highton Self-Reported versus Proxy-Reported Voter Turnout in the Current Population Survey , 2005 .

[15]  S. Lipsitz,et al.  Missing-Data Methods for Generalized Linear Models , 2005 .

[16]  Paul H Garthwaite,et al.  Bayesian analysis of misclassified binary data from a matched case–control study with a validation sub‐study , 2005, Statistics in medicine.

[17]  S. Hug,et al.  Methodological Issues in Studies of Conflict Processes Misclassifications and Endogenous Institutions , 2005 .

[18]  Christian Dustmann,et al.  An Analysis of Speaking Fluency of Immigrants Using Ordered Response Models With Classification Errors , 2004 .

[19]  Pat McInturff,et al.  Modelling risk when binary outcomes are subject to error , 2004, Statistics in medicine.

[20]  M. Pellizzari,et al.  OECD Social , Employment and Migration Working Papers No . 17 Take-Up of Welfare Benefits in OECD Countries , 2018 .

[21]  Nicole A. Lazar,et al.  Statistical Analysis With Missing Data , 2003, Technometrics.

[22]  J. Neuhaus,et al.  Binomial Regression with Misclassification , 2003, Biometrics.

[23]  Ori Davidov,et al.  Misclassification in Logistic Regression with Discrete Covariates , 2003 .

[24]  Erich Battistin Errors in survey reports of consumption expenditures , 2003 .

[25]  C. A. Cassel Overreporting And Electoral Participation Research , 2003 .

[26]  Paul H Garthwaite,et al.  A Simple Bayesian Analysis of Misclassified Binary Data with a Validation Substudy , 2002, Biometrics.

[27]  J. Schafer,et al.  Missing data: our view of the state of the art. , 2002, Psychological methods.

[28]  Peter Congdon Bayesian statistical modelling , 2002 .

[29]  Joseph G. Ibrahim,et al.  Bayesian methods for generalized linear models with covariates missing at random , 2002 .

[30]  Dhiren Ghosh,et al.  A RECALL EXPERIMENT: IMPACT OF TIME ON RECALL OF RECREATIONAL FISHING TRIPS , 2002 .

[31]  R. Montjoy,et al.  Overreporting voting: why it happens and why it matters. , 2001, Public opinion quarterly.

[32]  Matthew N. Beckmann,et al.  What Leads to Voting Overreports? Contrasts of Overreporters to Validated Voters and Admitted Nonvoters in the American National Election Studies , 2001 .

[33]  Joseph G. Ibrahim,et al.  Using auxiliary data for parameter estimation with non‐ignorably missing outcomes , 2001 .

[34]  John Bound,et al.  Measurement error in survey data , 2001 .

[35]  D. Dunson,et al.  Bayesian analysis of mutational spectra. , 2000, Genetics.

[36]  Hoon Kim,et al.  Monte Carlo Statistical Methods , 2000, Technometrics.

[37]  M Blettner,et al.  Measurement error correction using validation data: a review of methods and their applicability in case-control studies , 2000, Statistical methods in medical research.

[38]  Arthur Lewbel,et al.  IDENTIFICATION OF THE BINARY CHOICE MODEL WITH MISCLASSIFICATION , 2000, Econometric Theory.

[39]  Barry C. Burden Voter Turnout and the National Election Studies , 2000, Political Analysis.

[40]  Joseph G. Ibrahim,et al.  Power prior distributions for generalized linear models , 2000 .

[41]  J. Ibrahim,et al.  Power prior distributions for regression models , 2000 .

[42]  J. Neuhaus Bias and efficiency loss due to misclassified responses in binary regression , 1999 .

[43]  D Spiegelman,et al.  Matrix Methods for Estimating Odds Ratios with Misclassified Exposure Data: Extensions and Comparisons , 1999, Biometrics.

[44]  Robert F. Belli,et al.  Reducing vote overreporting in surveys : Social desirability, memory failure, and source monitoring , 1999 .

[45]  S. Jackman Correcting surveys for non-response and measurement error using auxiliary information , 1999 .

[46]  Joseph G. Ibrahim,et al.  Missing covariates in generalized linear models when the missing data mechanism is non‐ignorable , 1999 .

[47]  Paul E. Green,et al.  Bayesian Methods for Generalized Linear Models , 1999 .

[48]  Jerry A. Hausman,et al.  Semiparametric Estimation with Mismeasured Dependent Variables: An Application to Duration Models for Unemployment Spells , 1999 .

[49]  J. Ibrahim,et al.  Using Historical Controls to Adjust for Covariates in Trend Tests for Binary Data , 1998 .

[50]  J. Hausman,et al.  Misclassification of the dependent variable in a discrete-response setting , 1998 .

[51]  A. Gelman,et al.  Not Asked and Not Answered: Multiple Imputation for Multiple Surveys , 1998 .

[52]  Joseph L Schafer,et al.  Analysis of Incomplete Multivariate Data , 1997 .

[53]  R. Michael Alvarez,et al.  Information and elections , 1997 .

[54]  Joseph G. Ibrahim,et al.  A conditional model for incomplete covariates in parametric regression models , 1996 .

[55]  J G Ibrahim,et al.  Parameter estimation from incomplete data in binomial regression when the missing data mechanism is nonignorable. , 1996, Biometrics.

[56]  Christopher R. Bollinger,et al.  Bounding mean regressions when a binary regressor is mismeasured , 1996 .

[57]  R. Little,et al.  Pattern-mixture models for multivariate incomplete data with covariates. , 1996, Biometrics.

[58]  Peter Green,et al.  Markov chain Monte Carlo in Practice , 1996 .

[59]  J. Robins,et al.  Semiparametric regression estimation in the presence of dependent censoring , 1995 .

[60]  S. Chib,et al.  Understanding the Metropolis-Hastings Algorithm , 1995 .

[61]  J. Haukka,et al.  Correction for covariate measurement error in generalized linear models--a bootstrap approach. , 1995, Biometrics.

[62]  Margaret S. Pepe,et al.  A mean score method for missing and auxiliary covariate data in regression models , 1995 .

[63]  J. Poterba,et al.  Unemployment Benefits and Labor Market Transitions: A Multinomial Logit Model with Errors in Classification , 1995 .

[64]  Joel L. Horowitz,et al.  Identification and Robustness with Contaminated and Corrupted Data , 1995 .

[65]  J. Robins,et al.  Semiparametric Efficiency in Multivariate Regression Models with Missing Data , 1995 .

[66]  D. Ruppert,et al.  Measurement Error in Nonlinear Models , 1995 .

[67]  J. Robins,et al.  Estimation of Regression Coefficients When Some Regressors are not Always Observed , 1994 .

[68]  J. Kuha,et al.  Corrections for exposure measurement error in logistic regression models with an application to nutritional data. , 1994, Statistics in medicine.

[69]  M A Viana,et al.  Bayesian small-sample estimation of misclassified multinomial data. , 1994, Biometrics.

[70]  Jonathan Nagler,et al.  Scobit: An Alternative Estimator to Logit and Probit , 1994 .

[71]  C. Drake Effects of misspecification of the propensity score on estimators of treatment effect , 1993 .

[72]  John H. Aldrich Rational Choice and Turnout , 1993 .

[73]  Joel L. Horowitz,et al.  2 Semiparametric and nonparametric estimation of quantal response models , 1993 .

[74]  D. Rubin,et al.  Inference from Iterative Simulation Using Multiple Sequences , 1992 .

[75]  Jan E. Leighley,et al.  Individual and Systemic Influences on Turnout: Who Votes? 1984 , 1992, The Journal of Politics.

[76]  G. Casella,et al.  Explaining the Gibbs Sampler , 1992 .

[77]  W. Gilks,et al.  Adaptive Rejection Sampling for Gibbs Sampling , 1992 .

[78]  A. Greenwald,et al.  Attempts to improve the accuracy of self-reports of voting. , 1992 .

[79]  W. Härdle Applied Nonparametric Regression , 1992 .

[80]  Adrian F. M. Smith,et al.  Sampling-Based Approaches to Calculating Marginal Densities , 1990 .

[81]  R J Marshall,et al.  Validation study methods for estimating exposure proportions and odds ratios with misclassified data. , 1990, Journal of clinical epidemiology.

[82]  P. Abramson,et al.  Race-Related Differences in Self-Reported and Validated Turnout in 1986 , 1989, The Journal of Politics.

[83]  C. N. Morris,et al.  The calculation of posterior distributions by data augmentation , 1987 .

[84]  B. Silver,et al.  Measurement and Mismeasurement of the Validity of the Self-reported Vote* , 1986 .

[85]  J. Poterba,et al.  REPORTING ERRORS AND LABOR MARKET DYNAMICS , 1986 .

[86]  Paul R. Abramson,et al.  Who Overreports Voting? , 1986, American Political Science Review.

[87]  P. Abramson,et al.  Race-Related Differences in Self-Reported and Validated Turnout in 1984 , 1986, The Journal of Politics.

[88]  Kevin M. Murphy,et al.  Estimation and Inference in Two-Step Econometric Models , 1985 .

[89]  C. Manski Semiparametric analysis of discrete response: Asymptotic properties of the maximum score estimator , 1985 .

[90]  P. Abramson,et al.  Race-Related Differences in Self-Reported and Validated Turnout , 1984, The Journal of Politics.

[91]  K. Hill,et al.  Nonvoters in Voters' Clothing: The Impact of Voting Behavior Misreporting on Voting Behavior Research. , 1984 .

[92]  L. Sigelman The Nonvoting Voter in Voting Research , 1982 .

[93]  Michael W. Traugott,et al.  The Consequences of Validated and Self-Reported Voting Measures , 1981 .

[94]  T. Chen,et al.  Log-Linear Models for Categorical Data with Misclassification and Double Sampling , 1979 .

[95]  Elizabeth F Loftus,et al.  Leading questions and the eyewitness report , 1975, Cognitive Psychology.

[96]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[97]  B. Weir The Distortion of Voter Recall , 1975 .

[98]  Dennis J. Aigner,et al.  Regression with a binary independent variable subject to errors of observation , 1973 .

[99]  D. Cahalan CORRELATES OF RESPONDENT ACCURACY IN THE DENVER VALIDITY SURVEY , 1968 .

[100]  Aage R. Clausen RESPONSE VALIDITY: VOTE REPORT , 1968 .

[101]  I. Bross Misclassification in 2 X 2 Tables , 1954 .

[102]  M. Miller The Waukegan Study of Voter Turnout Prediction , 1952 .

[103]  Hugh J. Parry,et al.  Validity of Responses to Survey Questions , 1950 .