Regression models for choice-based samples with misclassification in the response variable

Abstract In this paper, we provide a general framework to deal with the presence of misclassification in the response variable in choice-based samples. The contaminated data sampling distribution is written as a function of the error-free conditional distribution of the dependent variable given the covariates and the conditional misclassification probabilities of the observable variable of interest given its latent values. We propose an extension of Imbens’ (Econometrica 60 (1992) 1187) efficient generalized method of moments to estimate this model and outline a specification test to detect the presence of this sort of measurement error. The performance of both the estimators and the test is investigated in a Monte Carlo simulation study, which shows very encouraging results.

[1]  Steven R. Lerman,et al.  The Estimation of Choice Probabilities from Choice Based Samples , 1977 .

[2]  J. Copas Binary Regression Models for Contaminated Data , 1988 .

[3]  Guido W. Imbens,et al.  Imposing Moment Restrictions from Auxiliary Data by Weighting , 1996, Review of Economics and Statistics.

[4]  Charles F. Manski,et al.  Estimation of Response Probabilities From Augmented Retrospective Observations , 1985 .

[5]  Jeffrey M. Wooldridge,et al.  Asymptotic properties of weighted M-estimators for variable probability samples , 1999 .

[6]  J. Hausman,et al.  Misclassification of the dependent variable in a discrete-response setting , 1998 .

[7]  R. Carroll,et al.  On Robustness in the Logistic Regression Model , 1993 .

[8]  Diane Lambert,et al.  Zero-inflacted Poisson regression, with an application to defects in manufacturing , 1992 .

[9]  J. Poterba,et al.  Unemployment Benefits and Labor Market Transitions: A Multinomial Logit Model with Errors in Classification , 1995 .

[10]  W. Newey,et al.  Large sample estimation and hypothesis testing , 1986 .

[11]  Jeffrey M. Wooldridge,et al.  ASYMPTOTIC PROPERTIES OF WEIGHTED M-ESTIMATORS FOR STANDARD STRATIFIED SAMPLES , 2001, Econometric Theory.

[12]  Pravin K. Trivedi,et al.  Regression Analysis of Count Data , 1998 .

[13]  A S Whittemore,et al.  Poisson regression with misclassified counts: application to cervical cancer. , 1991, Journal of the Royal Statistical Society. Series C, Applied statistics.

[14]  G. Imbens,et al.  Combining Micro and Macro Data in Microeconometric Models , 1994 .

[15]  David R. Cox The analysis of binary data , 1970 .

[16]  S. Cosslett,et al.  Maximum likelihood estimator for choice-based samples , 1981 .

[17]  Charles F. Manski,et al.  Alternative Estimators and Sample Designs for Discrete Choice Analysis , 1981 .

[18]  Marcel G. Dagenais,et al.  The dogit model , 1979 .

[19]  Tony Lancaster,et al.  Efficient estimation and stratified sampling , 1996 .

[20]  D. Ruppert,et al.  Measurement Error in Nonlinear Models , 1995 .

[21]  Raymond J. Carroll,et al.  On robust estimation in logistic case-control studies , 1993 .

[22]  D. Pregibon Resistant fits for some commonly used logistic models with medical application. , 1982, Biometrics.

[23]  Jerry A. Hausman,et al.  Semiparametric Estimation with Mismeasured Dependent Variables: An Application to Duration Models for Unemployment Spells , 1999 .

[24]  D. Cox,et al.  Analysis of Binary Data (2nd ed.). , 1990 .

[25]  Wagner A. Kamakura,et al.  Book Review: Structural Analysis of Discrete Data with Econometric Applications , 1982 .

[26]  Charles L. Odoroff,et al.  Log-Linear Models for Doubly Sampled Categorical Data Fitted by the EM Algorithm , 1985 .

[27]  Guido W. Imbens,et al.  An efficient method of moments estimator for discrete choice models with choice-based sampling , 1992 .

[28]  Dale J. Poirier,et al.  REVISING BELIEFS IN NONIDENTIFIED MODELS , 1998, Econometric Theory.

[29]  A. Ekholm,et al.  Exponential family non‐linear models for categorical data with errors of observation , 1987 .

[30]  Eric R. Ziegel,et al.  Analysis of Binary Data (2nd ed.) , 1991 .

[31]  R. Pyke,et al.  Logistic disease incidence models and case-control studies , 1979 .