Machine Learning Meets Microeconomics: The Case of Decision Trees and Discrete Choice

We provide a microeconomic framework for decision trees: a popular machine learning method. Specifically, we show how decision trees represent a non-compensatory decision protocol known as disjunctions-of-conjunctions and how this protocol generalizes many of the non-compensatory rules used in the discrete choice literature so far. Additionally, we show how existing decision tree variants address many economic concerns that choice modelers might have. Beyond theoretical interpretations, we contribute to the existing literature of two-stage, semi-compensatory modeling and to the existing decision tree literature. In particular, we formulate the first bayesian model tree, thereby allowing for uncertainty in the estimated non-compensatory rules as well as for context-dependent preference heterogeneity in one's second-stage choice model. Using an application of bicycle mode choice in the San Francisco Bay Area, we estimate our bayesian model tree, and we find that it is over 1,000 times more likely to be closer to the true data-generating process than a multinomial logit model (MNL). Qualitatively, our bayesian model tree automatically finds the effect of bicycle infrastructure investment to be moderated by travel distance, socio-demographics and topography, and our model identifies diminishing returns from bike lane investments. These qualitative differences lead to bayesian model tree forecasts that directly align with the observed bicycle mode shares in regions with abundant bicycle infrastructure such as Davis, CA and the Netherlands. In comparison, MNL's forecasts are overly optimistic.

[1]  Thomas P. Minka,et al.  Bayesian model averaging is not model combination , 2002 .

[2]  Eibe Frank,et al.  Logistic Model Trees , 2003, Machine Learning.

[3]  Shinji Teraji,et al.  Why Bounded Rationality , 2018 .

[4]  M. Newton Approximate Bayesian-inference With the Weighted Likelihood Bootstrap , 1994 .

[5]  Michael I. Jordan,et al.  Learning with Mixtures of Trees , 2001, J. Mach. Learn. Res..

[6]  F. Martínez,et al.  The constrained multinomial logit: A semi-compensatory choice model , 2009 .

[7]  Wiktor L. Adamowicz,et al.  Modeling non-compensatory preferences in environmental valuation , 2015 .

[8]  B. McKenzie,et al.  Modes Less Traveled—Bicycling and Walking to Work in the United States: 2008–2012 , 2014 .

[9]  P. Viswanath,et al.  Ensemble of randomized soft decision trees for robust classification , 2016 .

[10]  Ieee Xplore,et al.  IEEE Transactions on Pattern Analysis and Machine Intelligence Information for Authors , 2022, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Stephen G. Walker,et al.  Bayesian inference with misspecified models , 2013 .

[12]  Salvatore Ruggieri,et al.  Enumerating Distinct Decision Trees , 2017, ICML.

[13]  John Eltinge,et al.  Building Consistent Regression Trees From Complex Sample Data , 2011 .

[14]  T. Hesterberg,et al.  Weighted Average Importance Sampling and Defensive Mixture Distributions , 1995 .

[15]  Joan L. Walker,et al.  Preference endogeneity in discrete choice models , 2014 .

[16]  Joel Huber,et al.  Adapting Cutoffs to the Choice Environment: The Effects of Attribute Correlation and Reliability , 1991 .

[17]  유정수,et al.  어닐링에 의한 Hierarchical Mixtures of Experts를 이용한 시계열 예측 , 1998 .

[18]  Lior Rokach,et al.  Ensemble-based classifiers , 2010, Artificial Intelligence Review.

[19]  Qinghua Hu,et al.  Multivariate decision trees with monotonicity constraints , 2016, Knowl. Based Syst..

[20]  William Young A NON-TRADEOFF DECISION MAKING MODEL OF RESIDENTIAL LOCATION CHOICE , 1982 .

[21]  Joseph N. Wilson,et al.  Twenty Years of Mixture of Experts , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[22]  Chandra R. Bhat,et al.  A Comprehensive Dwelling Unit Choice Model Accommodating Psychological Constructs within a Search Strategy for Consideration Set Formation , 2015 .

[23]  R. Kohli,et al.  Representation and Inference of Lexicographic Preference Models and Their Variants , 2007 .

[24]  M. Pratola Efficient Metropolis–Hastings Proposal Mechanisms for Bayesian Regression Tree Models , 2013, 1312.1895.

[25]  E. Cascetta,et al.  Dominance among alternatives in random utility models , 2009 .

[26]  Benjamin Heydecker,et al.  A discrete choice model incorporating thresholds forperception in attribute values , 2006 .

[27]  Senén Barro,et al.  Do we need hundreds of classifiers to solve real world classification problems? , 2014, J. Mach. Learn. Res..

[28]  P. Green Reversible jump Markov chain Monte Carlo computation and Bayesian model determination , 1995 .

[29]  Tom Lodewyckx,et al.  Bayesian Versus Frequentist Inference , 2008 .

[30]  Clyde H. Coombs Mathematical Models in Psychological Scaling , 1951 .

[31]  Q. Vuong Likelihood Ratio Tests for Model Selection and Non-Nested Hypotheses , 1989 .

[32]  R. Dawes SOCIAL SELECTION BASED ON MULTIDIMENSIONAL CRITERIA. , 1964, Journal of abnormal psychology.

[33]  S. Bhattacharya,et al.  Transdimensional transformation based Markov chain Monte Carlo , 2014, Brazilian Journal of Probability and Statistics.

[34]  Qinghua Hu,et al.  Rank Entropy-Based Decision Trees for Monotonic Classification , 2012, IEEE Transactions on Knowledge and Data Engineering.

[35]  Margo I. Seltzer,et al.  Learning Certifiably Optimal Rule Lists , 2017, KDD.

[36]  T. Evgeniou,et al.  Disjunctions of Conjunctions, Cognitive Simplicity, and Consideration Sets , 2010 .

[37]  Moshe Ben-Akiva,et al.  STRUCTURE OF PASSENGER TRAVEL DEMAND MODELS , 1974 .

[38]  Christian Borgelt,et al.  An implementation of the FP-growth algorithm , 2005 .

[39]  Chandra R. Bhat,et al.  Accommodating variations in responsiveness to level-of-service measures in travel mode choice modeling , 1998 .

[40]  Peter E. Rossi,et al.  Marketing models of consumer heterogeneity , 1998 .

[41]  Gerhard Paass,et al.  Model Switching for Bayesian Classification Trees with Soft Splits , 1998, PKDD.

[42]  Greg M. Allenby,et al.  A Choice Model with Conjunctive, Disjunctive, and Compensatory Screening Rules , 2004 .

[43]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[44]  Hjp Harry Timmermans,et al.  A learning-based transportation oriented simulation system , 2004 .

[45]  Ken McLeod Where We Ride: Analysis of Bicycle Commuting in American Cities , 2014 .

[46]  Sreerama K. Murthy,et al.  Automatic Construction of Decision Trees from Data: A Multi-Disciplinary Survey , 1998, Data Mining and Knowledge Discovery.

[47]  L. Hansen Large Sample Properties of Generalized Method of Moments Estimators , 1982 .

[48]  J. Ortúzar,et al.  A semi-compensatory discrete choice model with explicit attribute thresholds of perception , 2005 .

[49]  Peter Boatwright,et al.  A Satisficing Choice Model , 2012, Mark. Sci..

[50]  H. Simon,et al.  A Behavioral Model of Rational Choice , 1955 .

[51]  A. Bronner,et al.  Decision styles in transport mode choice , 1982 .

[52]  K. Hornik,et al.  Model-Based Recursive Partitioning , 2008 .

[53]  A. Tversky Elimination by aspects: A theory of choice. , 1972 .

[54]  Philip L. H. Yu,et al.  Logit tree models for discrete choice data with application to advice-seeking preferences among Chinese Christians , 2016, Comput. Stat..

[55]  S. Lemon,et al.  Classification and regression tree analysis in public health: Methodological review and comparison with logistic regression , 2003, Annals of behavioral medicine : a publication of the Society of Behavioral Medicine.

[56]  Louis Wehenkel,et al.  A complete fuzzy decision tree technique , 2003, Fuzzy Sets Syst..

[57]  Andrew Gelman,et al.  Fitting Multilevel Models When Predictors and Group Effects Correlate , 2007 .

[58]  David A. Hensher,et al.  Embedding Decision Heuristics in Discrete Choice Models: A Review , 2012 .

[59]  Wei-Yin Loh,et al.  Fifty Years of Classification and Regression Trees , 2014 .

[60]  R. Olshen,et al.  Consistent nonparametric regression from recursive partitioning schemes , 1980 .

[61]  A. Gelman Iterative and Non-iterative Simulation Algorithms , 2006 .

[62]  Joffre Swait,et al.  A NON-COMPENSATORY CHOICE MODEL INCORPORATING ATTRIBUTE CUTOFFS , 2001 .

[63]  C. Manski The structure of random utility models , 1977 .

[64]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[65]  Mijung Kim,et al.  Two-stage multinomial logit model , 2011, Expert Syst. Appl..

[66]  John Mingers,et al.  An Empirical Comparison of Pruning Methods for Decision Tree Induction , 1989, Machine Learning.

[67]  Scott A. Sisson,et al.  Transdimensional Markov Chains , 2005 .

[68]  Christopher M. Bishop,et al.  Bayesian Hierarchical Mixtures of Experts , 2002, UAI.

[69]  Francisco Herrera,et al.  Monotonic Random Forest with an Ensemble Pruning Mechanism based on the Degree of Monotonicity , 2015, New Generation Computing.

[70]  P. Green,et al.  Reversible jump MCMC , 2009 .

[71]  Ta Theo Arentze,et al.  Parametric Action Decision Trees: Incorporating Continuous Attribute Variables Into Rule-Based Models of Discrete Choice , 2007 .

[72]  W. Loh,et al.  LOTUS: An Algorithm for Building Accurate and Comprehensible Logistic Regression Trees , 2004 .

[73]  Stephen P. Ryan,et al.  Machine Learning Methods for Demand Estimation , 2015 .

[74]  Andreas Holzinger,et al.  Data Mining with Decision Trees: Theory and Applications , 2015, Online Inf. Rev..

[75]  Shlomo Bekhor,et al.  Two-Stage Model for Jointly Revealing Determinants of Noncompensatory Conjunctive Choice Set Formation and Compensatory Choice , 2009 .

[76]  Tolga Tasdizen,et al.  Disjunctive normal random forests , 2015, Pattern Recognit..

[77]  Jacques Wainer,et al.  Comparison of 14 different families of classification algorithms on 115 binary datasets , 2016, ArXiv.

[78]  J. R. Quinlan Probabilistic decision trees , 1990 .

[79]  K. Train Discrete Choice Methods with Simulation , 2003 .

[80]  R. Kohli,et al.  Probabilistic Subset-Conjunctive Models for Heterogeneous Consumers , 2005 .

[81]  Shlomo Bekhor,et al.  Development and estimation of a semi-compensatory model with a flexible error structure , 2012 .

[82]  Soft Classification Trees , 2012 .

[83]  Alex Alves Freitas,et al.  A Survey of Evolutionary Algorithms for Decision-Tree Induction , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[84]  J.-S.R. Jang,et al.  Structure determination in fuzzy modeling: a fuzzy CART approach , 1994, Proceedings of 1994 IEEE 3rd International Fuzzy Systems Conference.

[85]  Moshe Ben-Akiva,et al.  Incorporating random constraints in discrete models of choice set generation , 1987 .

[86]  Carlo Giacomo Prato,et al.  Closing the gap between behavior and models in route choice: The role of spatiotemporal constraints and latent traits in choice set formation , 2012 .

[87]  Xiaogang Su,et al.  Tree‐based model checking for logistic regression , 2007, Statistics in medicine.

[88]  C. Manski Daniel McFadden and the Econometric Analysis of Discrete Choice , 2001 .

[89]  Lior Rokach,et al.  Top-down induction of decision trees classifiers - a survey , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[90]  Joffre Swait,et al.  Choice models based on mixed discrete/continuous PDFs , 2009 .

[91]  James F. Foerster,et al.  Mode choice decision process models: A comparison of compensatory and non-compensatory structures , 1979 .

[92]  Jonathan Levin,et al.  The Data Revolution and Economic Analysis , 2013, Innovation Policy and the Economy.

[93]  Harry Timmermans,et al.  Cognitive Process Model of Individual Choice Behaviour Incorporating Principles of Bounded Rationality and Heterogeneous Decision Heuristics , 2010 .

[94]  Andrew Daly,et al.  Allowing for heterogeneous decision rules in discrete choice models: an approach and four case studies , 2011 .

[95]  Sunil Vadera,et al.  A survey of cost-sensitive decision tree induction algorithms , 2013, CSUR.

[96]  Timothy Brathwaite,et al.  The Holy Trinity: Blending Statistics, Machine Learning and Discrete Choice, with Applications to Strategic Bicycle Planning , 2018 .

[97]  Scott A. Sisson,et al.  Reversible Jump MCMC , 2011 .

[98]  Christophe Marsala,et al.  Rank discrimination measures for enforcing monotonicity in decision tree induction , 2015, Inf. Sci..

[99]  M. Ben-Akiva,et al.  EMPIRICAL TEST OF A CONSTRAINED CHOICE DISCRETE MODEL : MODE CHOICE IN SAO PAULO, BRAZIL , 1987 .

[100]  Peter Buhlmann,et al.  BOOSTING ALGORITHMS: REGULARIZATION, PREDICTION AND MODEL FITTING , 2007, 0804.2752.

[101]  Adrian F. M. Smith,et al.  A Bayesian CART algorithm , 1998 .

[102]  Marina Velikova,et al.  Decision trees for monotone price models , 2004, Comput. Manag. Sci..

[103]  John W. Polak,et al.  Simplified probabilistic choice set formation models in a residential location choice context , 2013 .

[104]  Raul Cano On The Bayesian Bootstrap , 1992 .

[105]  R. Tibshirani,et al.  Model Search by Bootstrap “Bumping” , 1999 .

[106]  G. Tutz,et al.  An introduction to recursive partitioning: rationale, application, and characteristics of classification and regression trees, bagging, and random forests. , 2009, Psychological methods.

[107]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[108]  W. Kamakura,et al.  Modeling Preference and Structural Heterogeneity in Consumer Choice , 1996 .

[109]  J. Franklin,et al.  The elements of statistical learning: data mining, inference and prediction , 2005 .

[110]  Terry Elrod,et al.  A new integrated model of noncompensatory and compensatory decision strategies , 2004 .

[111]  Joffre Swait,et al.  Context Dependence and Aggregation in Disaggregate Choice Analysis , 2002 .

[112]  Cynthia Rudin,et al.  Interpretable classifiers using rules and Bayesian analysis: Building a better stroke prediction model , 2015, ArXiv.

[113]  A. Rivlin,et al.  Economic Choices , 2001 .

[114]  Ronald L. Rivest,et al.  Learning decision lists , 2004, Machine Learning.

[115]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[116]  Andrew Gelman,et al.  Multilevel (Hierarchical) Modeling: What It Can and Cannot Do , 2006, Technometrics.

[117]  Akshay Vij,et al.  Incorporating the influence of latent modal preferences on travel mode choice behavior , 2013 .

[118]  Paul E. Green,et al.  Completely Unacceptable Levels in Conjoint Analysis: A Cautionary Note , 1988 .

[119]  Michael Braun,et al.  Scalable Rejection Sampling for Bayesian Hierarchical Models , 2014, Mark. Sci..

[120]  Caspar G. Chorus,et al.  Random Regret Minimization: An Overview of Model Properties and Empirical Evidence , 2012 .

[121]  Denis Nekipelov,et al.  Demand Estimation with Machine Learning and Model Combination , 2015 .

[122]  Dan Steinberg,et al.  THE HYBRID CART-LOGIT MODEL IN CLASSIFICATION AND DATA MINING , 1998 .

[123]  A. Zeileis,et al.  Gaining insight with recursive partitioning of generalized linear models , 2013 .

[124]  R. Olshen,et al.  Almost surely consistent nonparametric regression from recursive partitioning schemes , 1984 .

[125]  Joffre Swait,et al.  Choice set generation within the generalized extreme value family of discrete choice models , 2001 .

[126]  A. Tversky,et al.  Rational choice and the framing of decisions , 1990 .

[127]  Ralph Buehler,et al.  Making Cycling Irresistible: Lessons from The Netherlands, Denmark and Germany , 2008 .

[128]  J. Swait,et al.  Probabilistic choice set generation in transportation demand models , 1984 .

[129]  Khandker Nurul Habib,et al.  Myopic choice or rational decision making? An investigation into mode choice preference structures in competitive modal arrangements in a multimodal urban area, the City of Toronto , 2016 .

[130]  J. Marschak Binary Choice Constraints on Random Utility Indicators , 1959 .

[131]  G Gigerenzer,et al.  Reasoning the fast and frugal way: models of bounded rationality. , 1996, Psychological review.

[132]  Gareth O. Roberts,et al.  A General Framework for the Parametrization of Hierarchical Models , 2007, 0708.3797.

[133]  Simon Kasif,et al.  A System for Induction of Oblique Decision Trees , 1994, J. Artif. Intell. Res..

[134]  Michael Schlosser,et al.  Non-Linear Decision Trees - NDT , 1996, ICML.

[135]  A. J. Feelders,et al.  Classification trees for problems with monotonicity constraints , 2002, SKDD.

[136]  Michael A. West,et al.  Bayesian CART: Prior Specification and Posterior Simulation , 2007 .

[137]  Mijung Kim Two-stage logistic regression model , 2009, Expert Syst. Appl..

[138]  Ethem Alpaydin,et al.  Bagging Soft Decision Trees , 2016, Machine Learning for Health Informatics.