A comparison of the predictive ability of mode choice models with various levels of complexity

Abstract Models of mode choice have recently been developed which include a large number of explanatory variables. The inclusion of some of these variables is obviously the result of trial-and-error analysis of various model specifications: the researcher tries various specifications until he obtains a specification which is consistent with a priori beliefs and fits the data fairly well. This method of model specification allows one to “learn” from the data, but is also open to the critism that the resultant model simply reflects relations which happen to exist in the sample, rather than true, behavioral relations. This paper examines this question. A complex model is presented which was developed after attempting a wide variety of specifications. The predictive ability of this model is compared with that of models with fewer variables, each of which could be included on the basis of a priori ideas. It is found that the complex model predicts best, indicating that the behavioral content of the model which was developed through “learning” from the data is greater than that of models which were specified on a priori beliefs.