A GMM Approach for Dealing with Missing Data on Regressors

Missing data are a common challenge facing empirical researchers. This paper presents a general GMM framework and estimator for dealing with missing values of an explanatory variable in linear regression analysis. The GMM estimator is efficient under assumptions needed for consistency of linear-imputation methods. The estimator, which also allows for a specification test of the missingness assumptions, is compared to existing linear imputation, complete data, and dummy variable methods commonly used in empirical research. The dummy variable method is generally inconsistent even when data are missing completely at random, and the dummy variable method, when consistent, can be less efficient than the complete data method.

[1]  EFFICIENT GMM ESTIMATION WITH A GENERAL MISSING DATA PATTERN , 2010 .

[2]  D. Conniffe,et al.  Efficient Probit Estimation with Partially Missing Covariates , 2009, SSRN Electronic Journal.

[3]  J. Robins,et al.  Estimation of Regression Coefficients When Some Regressors are not Always Observed , 1994 .

[4]  Michael Wooldridge,et al.  Econometric Analysis of Cross Section and Panel Data, 2nd Edition , 2001 .

[5]  Franco Peracchi,et al.  Regression with Imputed Covariates: A Generalized Missing Indicator Approach , 2009 .

[6]  Magne Mogstad,et al.  Instrumental Variables Estimation with Partially Missing Instruments , 2010, SSRN Electronic Journal.

[7]  Stefano DellaVigna,et al.  Does Movie Violence Increase Violent Crime?∗ , 2006 .

[8]  Marcel G. Dagenais,et al.  The use of incomplete observations in multiple regression analysis: A generalized least squares approach , 1973 .

[9]  B. Graham,et al.  Inverse Probability Tilting for Moment Condition Models with Missing Data , 2008 .

[10]  D. Lien,et al.  A note on estimating regression coefficients with missing data , 1992 .

[11]  Theo Nijman,et al.  Efficiency gains due to using missing data procedures in regression models , 1988 .

[12]  Michael P. Jones Indicator and stratification methods for missing explanatory variables in multiple linear regression , 1996 .

[13]  W. Newey,et al.  Large sample estimation and hypothesis testing , 1986 .

[14]  Denis Conniffe Small-Sample Properties of Estimators of Regression Coefficients Given a Common Pattern of Missing Data , 1983 .

[15]  Hausman,et al.  Missing Data and Self-Selection in Large Panels , 1978 .

[16]  David Card,et al.  Using Geographic Variation in College Proximity to Estimate the Return to Schooling , 1993 .

[17]  Jeffrey M. Wooldridge,et al.  Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data , 2003 .

[18]  D. Horvitz,et al.  A Generalization of Sampling Without Replacement from a Finite Universe , 1952 .

[19]  Zvi Griliches,et al.  ECONOMIC DATA ISSUES , 1986 .

[20]  David K. Guilkey,et al.  GMM with Multiple Missing Variables , 2016 .

[21]  Alain Monfort,et al.  On the Problem of Missing Data in Linear Models , 1981 .