A control-function approach to correct for endogeneity in discrete choice models estimated on SP-off-RP data and contrasts with an earlier FIML approach by Train & Wilson

Abstract It is common practice to build Stated Preference (SP) attributes and alternatives from observed Revealed Preference (RP) choices with a view to increasing realism. While many surveys pivot all alternatives around an observed choice, others use more adaptive approaches in which changes are made depending on what alternative was chosen in the RP setting. For example, in SP-off-RP data, the alternative chosen in the RP setting is worsened in the SP setting and other alternatives are improved to induce a change in behaviour. This facilitates the creation of meaningful trade-offs or tipping points but introduces endogeneity. This source of endogeneity was largely ignored until Train and Wilson (T&W) proposed a full information maximum likelihood (FIML) solution that can be implemented with simulation. In this article, we propose a limited information maximum likelihood (LIML) approach to address the SP-off-RP problem using a method which does not need simulation, can be applied with standard software and uses data that is already available for the stated problem. The proposed method is an application of the control-function (CF) method to correct for endogeneity in discrete choice models, using the RP attributes as instrumental variables. We discuss the theoretical and practical advantages and disadvantages of the CF and T&W methods and illustrate them using Monte Carlo and real data. Results show that, while the T&W method may be more efficient in theory, it may however fail to retrieve consistent estimators when it does not account properly for the data generation process if, e.g., an exogenous source of correlation among the SP choice tasks exists. On the other hand, the CF is more robust, i.e. less sensitive, to the data generation process assumptions, and is considerably easier to apply with standard software and does not require simulation, facilitating its adoption and the more extensive use of SP-off-RP data.

[1]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[2]  Paul J Rathouz,et al.  Two-stage residual inclusion estimation: addressing endogeneity in health econometric modeling. , 2008, Journal of health economics.

[3]  David A. Hensher,et al.  Hypothetical bias, choice experiments and willingness to pay , 2010 .

[4]  D. Rivers,et al.  Limited Information Estimators and Exogeneity Tests for Simultaneous Probit Models , 1988 .

[5]  Cristián Angelo Guevara-Cue Endogeneity and Sampling of Alternatives in Spatial Choice Models , 2010 .

[6]  C. Angelo Guevara,et al.  Critical assessment of five methods to correct for endogeneity in discrete-choice models , 2015 .

[7]  Jeffrey M. Woodbridge Econometric Analysis of Cross Section and Panel Data , 2002 .

[8]  M. Ben-Akiva,et al.  Combining revealed and stated preferences data , 1994 .

[9]  C. Angelo Guevara,et al.  Overidentification tests for the exogeneity of instruments in discrete choice models , 2018, Transportation Research Part B: Methodological.

[10]  Cristián Angelo Guevara-Cue Addressing endogeneity in residential location models , 2005 .

[11]  Cristian Angelo Guevara,et al.  Change of Scale and Forecasting with the Control-Function Method in Logit Models , 2011, Transp. Sci..

[12]  Lung-fei Lee,et al.  Specification error in multinomial logit models : Analysis of the omitted variable bias , 1982 .

[13]  Greg M. Allenby,et al.  Incorporating Prior Knowledge into the Analysis of Conjoint Studies , 1995 .

[14]  Kenneth Train,et al.  Standard error correction in two-stage estimation with nested samples , 2003 .

[15]  P. Ruud Sufficient Conditions for the Consistency of Maximum Likelihood Estimation Despite Misspecifications of Distribution in Multinomial Discrete Choice Models , 1983 .

[16]  Kenneth E. Train,et al.  Discrete Choice Methods with Simulation , 2016 .

[17]  C. A. Guevara,et al.  Correcting for endogeneity due to omitted attributes in discrete-choice models: the multiple indicator solution , 2016 .

[18]  John M. Rose,et al.  Stated choice experimental design theory: the who, the what and the why , 2014 .

[19]  Caspar G. Chorus,et al.  Vacation behaviour under high travel cost conditions – A stated preference of revealed preference approach , 2014 .

[20]  A. Daly,et al.  MODELS USING MIXED STATED-PREFERENCE AND REVEALED-PREFERENCE INFORMATION. , 1991 .

[21]  M. Ben-Akiva,et al.  Endogeneity in Residential Location Choice Models , 2006 .

[22]  Kenneth Train,et al.  The Navigation Economic Technologies Program Monte Carlo Analysis of Sp-off-rp Data Navigation · Economics · Technologies Navigation Economic Technologies Expanding the Body of Knowledge Creating a Planning Toolbox Analysis of Sp-off-rp Data Monte Carlo Analysis of Sp-off-rp Data * , 2007 .

[23]  N. Shinghal,et al.  FREIGHT MODE CHOICE AND ADAPTIVE STATED PREFERENCES , 2002 .

[24]  J. Heckman Dummy Endogenous Variables in a Simultaneous Equation System , 1977 .

[25]  Moshe Ben-Akiva,et al.  Discrete Choice Analysis: Theory and Application to Travel Demand , 1985 .

[26]  K. Train,et al.  Estimation on stated-preference experiments constructed from revealed-preference choices , 2008 .

[27]  J. Terza Two-Stage Residual Inclusion Estimation in Health Services Research and Health Economics. , 2018, Health services research.