Critical assessment of five methods to correct for endogeneity in discrete-choice models

Endogeneity often arises in discrete-choice models, precluding the consistent estimation of the model parameters, but it is habitually neglected in practical applications. The purpose of this article is to contribute in closing that gap by assessing five methods to address endogeneity in this context: the use of Proxys (PR); the two steps Control-Function (CF) method; the simultaneous estimation of the CF method via Maximum-Likelihood (ML); the Multiple Indicator Solution (MIS); and the integration of Latent-Variables (LV). The assessment is first made qualitatively, in terms of the formulation, normalization and data needs of each method. Then, the evaluation is made quantitatively, by means of a Monte Carlo experiment to study the finite sample properties under a unified data generation process, and to analyze the impact of common flaws. The methods studied differ notably in the range of problems that they can address; their underlying assumptions; the difficulty of gathering proper auxiliary variables needed to apply them; and their practicality, both in terms of the need for coding and their computational burden. The analysis developed in this article shows that PR is formally inappropriate for many cases, but it is easy to apply, and often corrects in the right direction. CF is also easy to apply with canned software, but requires instrumental variables which may be hard to collect in various contexts. Since CF is estimated in two stages, it may also compromise efficiency and difficult the estimation of standard errors. ML guarantees efficiency and direct estimation of the standard errors, but at the cost of larger computational burden required for the estimation of a multifold integral, with potential difficulties in identification, and retaining the difficulty of gathering proper instrumental variables. The MIS method appears relatively easy to apply and requiring indicators that may be easier to obtain in various cases. Finally, the LV approach appears as the more versatile method, but at a high cost in computational burden, problems of identification and limitations in the capability of writing proper structural equations for the latent variable.

[1]  J. Hausman Valuation of New Goods Under Perfect and Imperfect Competition , 1994 .

[2]  John M. Quigley,et al.  Housing Demand in the Short Run: An Analysis of Polytomous Choice , 1976 .

[3]  D. Rivers,et al.  Limited Information Estimators and Exogeneity Tests for Simultaneous Probit Models , 1988 .

[4]  Joan L. Walker,et al.  Identification of parameters in normal error component logit-mixture (NECLM) models , 2007 .

[5]  P. Ruud Sufficient Conditions for the Consistency of Maximum Likelihood Estimation Despite Misspecifications of Distribution in Multinomial Discrete Choice Models , 1983 .

[6]  Matthew J. Higgins,et al.  Estimating flight-level price elasticities using online airline data: A first step toward integrating pricing, demand, and revenue optimization , 2014 .

[7]  K. Train Discrete Choice Methods with Simulation , 2003 .

[8]  J. Tukey,et al.  Variations of Box Plots , 1978 .

[9]  Steven T. Berry,et al.  Automobile Prices in Market Equilibrium , 1995 .

[10]  Fernando V. Ferreira You can take it with you: Proposition 13 tax benefits, residential mobility, and willingness to pay for housing amenities ☆ , 2010 .

[11]  W. Newey,et al.  Generalized method of moments specification testing , 1985 .

[12]  Andrew Chesher,et al.  Instrumental Variable Models for Discrete Outcomes , 2008 .

[13]  Chandra R. Bhat,et al.  Joint Analysis of Injury Severity of Drivers in Two-Vehicle Crashes Accommodating Seat Belt Use Endogeneity , 2013 .

[14]  Elie Tamer,et al.  Partial Identification in Econometrics , 2010 .

[15]  Michel Bierlaire,et al.  Using semi-open questions to integrate perceptions in choice models , 2014 .

[16]  Joan L. Walker,et al.  Integration of Choice and Latent Variable Models , 1999 .

[17]  Moshe Ben-Akiva,et al.  Addressing Endogeneity in Discrete Choice Models: Assessing Control-Function and Latent-Variable Methods , 2010 .

[18]  C. A. Guevara,et al.  CORRECTING FOR ENDOGENEITY WITHOUT INSTRUMENTS IN DISCRETE CHOICE MODELS: THE MULTIPLE INDICATOR SOLUTION , 2013 .

[19]  Andrew Chesher,et al.  An instrumental variable model of multiple discrete choice , 2011 .

[20]  J. Heckman Dummy Endogenous Variables in a Simultaneous Equation System , 1977 .

[21]  Cristian Angelo Guevara,et al.  Change of Scale and Forecasting with the Control-Function Method in Logit Models , 2011, Transp. Sci..

[22]  J. Stock,et al.  Instrumental Variables Regression with Weak Instruments , 1994 .

[23]  Cristian Angelo Guevara,et al.  About Multiple Classification Analysis in Trip Production Models , 2007 .

[24]  Andrew Chesher,et al.  Nonparametric Identification under Discrete Variation , 2003 .

[25]  Chandra R. Bhat,et al.  A MIXED SPATIALLY CORRELATED LOGIT MODEL: FORMULATION AND APPLICATION TO RESIDENTIAL CHOICE MODELING , 2004 .

[26]  Cristian Angelo Guevara,et al.  A Monte Carlo experiment to analyze the curse of dimensionality in estimating random coefficients models with a full variance-covariance matrix , 2012 .

[27]  H. White Maximum Likelihood Estimation of Misspecified Models , 1982 .

[28]  Joan L. Walker,et al.  Generalized random utility model , 2002, Math. Soc. Sci..

[29]  Jeffrey M. Wooldridge,et al.  Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data , 2003 .

[30]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[31]  John M. Rose,et al.  Crowding in public transport systems: Effects on users, operation and implications for the estimation of demand , 2013 .

[32]  A. Chesher Identification in Nonseparable Models , 2003 .

[33]  Lung-fei Lee,et al.  Specification error in multinomial logit models : Analysis of the omitted variable bias , 1982 .

[34]  Paul Waddell,et al.  A MULTINOMIAL LOGIT MODEL OF RACE AND URBAN STRUCTURE , 1992 .

[35]  M. Wardman,et al.  Twenty Years of Rail Crowding Valuation Studies: Evidence and Lessons from British Experience , 2011 .

[36]  Sebastián Raveau,et al.  Inclusion of latent variables in Mixed Logit models: Modelling and forecasting , 2010 .

[37]  Aviv Nevo Measuring Market Power in the Ready-to-Eat Cereal Industry , 1998 .

[38]  R. D. Blander Which Null Hypothesis Do Overidentification Restrictions Actually Test , 2008 .

[39]  A. Chesher,et al.  An instrumental variable model of multiple discrete choice: IV model of multiple discrete choice , 2013 .

[40]  K. Train,et al.  A Control Function Approach to Endogeneity in Consumer Choice Models , 2010 .

[41]  M. Ben-Akiva,et al.  Endogeneity in Residential Location Choice Models , 2006 .

[42]  Kenneth Train,et al.  Standard error correction in two-stage estimation with nested samples , 2003 .

[43]  Frank S. Koppelman,et al.  Representing the differences between female and male commute behavior in residential location choice models , 2001 .

[44]  Joan L. Walker,et al.  INTEGRATION OF CHOICE AND LATENT VARIABLE MODELS. IN: IN PERPETUAL MOTION: TRAVEL BEHAVIOR RESEARCH OPPORTUNITIES AND APPLICATION CHALLENGES , 2002 .