The Holy Trinity: Blending Statistics, Machine Learning and Discrete Choice, with Applications to Strategic Bicycle Planning

Author(s): Brathwaite, Timothy | Advisor(s): Walker, Joan L | Abstract: Every day, decision-makers make choices among finite and discrete sets of alternatives. For example, people decide whether to walk, bike, take transit, or drive to work; shoppers decide which of the available brands of toothpaste to buy; and firms decide which vacant buildings they will rent for office space. Across these disparate domains, discrete choice models mathematically represent the procedures that analysts believe decision-makers are using to make such choices.Historically, the field of discrete choice modeling grew mainly out of economics, and this lineage has had long-lasting methodological ramifications. In particular, despite the great mathematical similarity between discrete choice models and models in statistics, machine learning, and causal inference, discrete choice research remains mostly siloed, seldom drawing from or contributing to methods in these related disciplines.In this dissertation, we help demolish the methodological silo around discrete choice re- search. Drawing from recent techniques in statistics, machine learning, and causal inference, we remove substantive limitations on the decision-making processes that could be be represented and predicted with previously available discrete choice methods. At the same time, by addressing concerns of discrete choice modelers, we make methodological contributions to the fields of statistics and machine learning, and we identify future research areas where discrete choice modelers are well suited to advancing the state of the art in causal inference.Importantly, the methodological advances described above were not divorced from to- day’s societal concerns. Given that more and more government agencies are (unsuccessfully) attempting to raise bicycle commuting rates in their jurisdictions, we guide our interactions with the statistics, machine learning, and causal inference literatures by trying to more accurately model an individual’s choice of commuting by bicycle. In particular, we use parametric link functions from statistics to better model the adoption and abandonment of bicycling. From machine learning, we use decision trees to represent the non-compensatory decision protocols that individuals appear to follow when deciding whether to commute by bicycle, and we use diagrams from the causal inference literature to gain insight into how we can bet- ter model the effects of bike lane investments on bicycle commute mode shares. All together, we not only make methodological contributions to the fields of discrete choice, statistics, machine learning, and causal inference, but we contribute to the efforts of transportation planners and modelers who are trying to make our cities and regions more sustainable and environmentally friendly. The methods developed in this dissertation have applications to strategic bicycle planning, helping analysts understand when certain interventions are not enough to cause people to abandon non-bicycle modes of travel at the desired rates and what alternative interventions might be more effective.In total, the specific contributions of this dissertation are the following:1. We create a new spatial unit of analysis (the zone of likely travel) for the incorporation of roadway-level variables such as presence and type of bicycle infrastructure, roadway slopes, and traffic speeds into mode choice models.2. We propose and demonstrate the novel use decision-tree methods for directly including roadway-level variables in mode choice models.3. We create a new class of closed-form, finite-parameter, multinomial choice models that avoid an undesirable symmetry property that we describe in Chapter 3.4. We create a procedure for using this new class of models to extend many existing binary choice models to the multinomial setting for the first time.5. We create methods for creating new, symmetric and asymmetric, binary choice models.6. We provide a microeconomic framework for interpreting decision trees by showing that decision trees represent a non-compensatory decision rule known as disjunctions-of-conjunctions and that such rules generalize many of the non-compensatory rules used in the discrete choice literature so far.7. We propose and estimate the first bayesian model tree, thereby combining decision trees and discrete choice models in the first two-stage, semi-compensatory model that jointly: a) uses disjunctions-of-conjunctions for the choice-set generation stage, b) allows for context-dependent preference heterogeneity in the choice stage, and c) quantifies analyst uncertainty in the estimated disjunctions-of-conjunctions8. We identify techniques such as the use of causal diagrams that can be borrowed from the causal inference literature to improve the ability of discrete choice modelers to predict outcomes under external changes or policy interventions such as investing in on-street bicycle lanes.9. We identify areas of the causal inference literature that can be improved through the incorporation of techniques from discrete choice or through the application of causal inference techniques that are very relevant to discrete choice modellers yet only infrequently researched by traditional causal inference researchers.Through this dissertation, we empirically demonstrate most of our contributions using commute mode choice data from the San Francisco Bay Area. In every case, we found that the new models developed as part of this dissertation fit our data better than traditional discrete choice models. These results were stable across all measures of fit that were used, whether the measures were in-sample or out-of-sample, frequentist or bayesian. Beyond fit, all of our new models also proved to be qualitatively different than traditional discrete choice methods. Our new models provided insights and forecasts that both made more sense and were more accurate than their traditional counterparts. Finally, our contributions related to causal inference are the only items from the list above without empirical demonstrations. Instead, these contributions are bolstered by substantial literature review, discussion, and thought exercises that show the (general and bicycle specific) benefits of merging discrete choice and causal inference techniques.

[1]  G. Tutz,et al.  An introduction to recursive partitioning: rationale, application, and characteristics of classification and regression trees, bagging, and random forests. , 2009, Psychological methods.

[2]  Thomas P. Minka,et al.  Bayesian model averaging is not model combination , 2002 .

[3]  Shinji Teraji,et al.  Why Bounded Rationality , 2018 .

[4]  Dipak K. Dey,et al.  A new class of flexible link functions with application to species co-occurrence in cape floristic region , 2013, 1401.1915.

[5]  Elias Bareinboim,et al.  Transportability from Multiple Environments with Limited Experiments: Completeness Results , 2014, NIPS.

[6]  Edward E. Leamer,et al.  Sensitivity Analyses Would Help , 1985 .

[7]  Harry Timmermans,et al.  Scobit-Based Panel Analysis of Multitasking Behavior of Public Transport Users , 2010 .

[8]  A. Philip Dawid,et al.  Beware of the DAG! , 2008, NIPS Causality: Objectives and Assessment.

[9]  T. Stukel Generalized Logistic Models , 1988 .

[10]  Joffre Swait,et al.  Choice models based on mixed discrete/continuous PDFs , 2009 .

[11]  유정수,et al.  어닐링에 의한 Hierarchical Mixtures of Experts를 이용한 시계열 예측 , 1998 .

[12]  Caspar G. Chorus,et al.  Random regret minimization for consumer choice modeling: Assessment of empirical evidence , 2013 .

[13]  Benjamin Heydecker,et al.  A discrete choice model incorporating thresholds forperception in attribute values , 2006 .

[14]  Eibe Frank,et al.  Logistic Model Trees , 2003, Machine Learning.

[15]  Michael I. Jordan,et al.  Learning with Mixtures of Trees , 2001, J. Mach. Learn. Res..

[16]  M. Pratola Efficient Metropolis–Hastings Proposal Mechanisms for Bayesian Regression Tree Models , 2013, 1312.1895.

[17]  Joel Huber,et al.  Adapting Cutoffs to the Choice Environment: The Effects of Attribute Correlation and Reliability , 1991 .

[18]  Heleno Bolfarine,et al.  A Framework for Skew-Probit Links in Binary Regression , 2010 .

[19]  Elias Bareinboim,et al.  Transportability of Causal Effects: Completeness Results , 2012, AAAI.

[20]  M. Hernán,et al.  Compound Treatments and Transportability of Causal Inference , 2011, Epidemiology.

[21]  Ricard Gavaldà,et al.  Identifiability and transportability in dynamic causal networks , 2016, International Journal of Data Science and Analytics.

[22]  B. Efron,et al.  Bootstrap confidence intervals , 1996 .

[23]  Marina Velikova,et al.  Decision trees for monotone price models , 2004, Comput. Manag. Sci..

[24]  Margo I. Seltzer,et al.  Learning Certifiably Optimal Rule Lists , 2017, KDD.

[25]  H. Simon,et al.  A Behavioral Model of Rational Choice , 1955 .

[26]  Michael Schlosser,et al.  Non-Linear Decision Trees - NDT , 1996, ICML.

[27]  D. Rubin For objective causal inference, design trumps analysis , 2008, 0811.1640.

[28]  J. Ortúzar,et al.  A semi-compensatory discrete choice model with explicit attribute thresholds of perception , 2005 .

[29]  Michael Braun,et al.  Scalable Rejection Sampling for Bayesian Hierarchical Models , 2014, Mark. Sci..

[30]  Bradley P. Carlin,et al.  Bayesian measures of model complexity and fit , 2002 .

[31]  W. Recker,et al.  Discrete choice with an oddball alternative , 1995 .

[32]  David S. Wishart,et al.  Applications of Machine Learning in Cancer Prediction and Prognosis , 2006, Cancer informatics.

[33]  Ethem Alpaydin,et al.  Bagging Soft Decision Trees , 2016, Machine Learning for Health Informatics.

[34]  R. Kohli,et al.  Probabilistic Subset-Conjunctive Models for Heterogeneous Consumers , 2005 .

[35]  Shoichiro Nakayama,et al.  Unified closed-form expression of logit and weibit and its extension to a transportation network equilibrium assignment , 2015 .

[36]  Tolga Tasdizen,et al.  Disjunctive normal random forests , 2015, Pattern Recognit..

[37]  Dennis D. Boos,et al.  The IOS Test for Model Misspecification , 2004 .

[38]  R. Fisher,et al.  On the Mathematical Foundations of Theoretical Statistics , 1922 .

[39]  R. Tibshirani,et al.  Model Search by Bootstrap “Bumping” , 1999 .

[40]  Claudia Czado,et al.  On Link Selection in Generalized Linear Models , 1992 .

[41]  Wes McKinney,et al.  Data Structures for Statistical Computing in Python , 2010, SciPy.

[42]  Terry Elrod,et al.  A new integrated model of noncompensatory and compensatory decision strategies , 2004 .

[43]  Jonathan Levin,et al.  The Data Revolution and Economic Analysis , 2013, Innovation Policy and the Economy.

[44]  Chandra R. Bhat,et al.  Accommodating variations in responsiveness to level-of-service measures in travel mode choice modeling , 1998 .

[45]  Fred L. Hall,et al.  Spatial transferability of an ordered response model of trip generation , 1997 .

[46]  Peter Spirtes,et al.  Introduction to Causal Inference , 2010, J. Mach. Learn. Res..

[47]  Kenneth A Bollen,et al.  10. Using Instrumental Variable Tests to Evaluate Model Specification in Latent Variable Structural Equation Models , 2009, Sociological methodology.

[48]  James F. Foerster,et al.  Mode choice decision process models: A comparison of compensatory and non-compensatory structures , 1979 .

[49]  Fred M. Feinberg,et al.  Reality Check: Combining Choice Experiments with Market Data to Estimate the Importance of Product Attributes , 2010, Manag. Sci..

[50]  Joffre Swait,et al.  A NON-COMPENSATORY CHOICE MODEL INCORPORATING ATTRIBUTE CUTOFFS , 2001 .

[51]  Hea-Jung Kim BINARY REGRESSION WITH A CLASS OF SKEWED t LINK MODELS , 2002 .

[52]  Judea Pearl,et al.  The Do-Calculus Revisited , 2012, UAI.

[53]  J. Sargan THE ESTIMATION OF ECONOMIC RELATIONSHIPS USING INSTRUMENTAL VARIABLES , 1958 .

[54]  R. Keener Theoretical Statistics: Topics for a Core Course , 2010 .

[55]  A. Abbott Transcending General Linear Reality , 1988 .

[56]  Jennifer L. Hill,et al.  Bayesian Nonparametric Modeling for Causal Inference , 2011 .

[57]  C. Angelo Guevara,et al.  Critical assessment of five methods to correct for endogeneity in discrete-choice models , 2015 .

[58]  Sreerama K. Murthy,et al.  Automatic Construction of Decision Trees from Data: A Multi-Disciplinary Survey , 1998, Data Mining and Knowledge Discovery.

[59]  Felipe González,et al.  A Maximum Entropy Estimator for the Aggregate Hierarchical Logit Model , 2011, Entropy.

[60]  Mijung Kim Two-stage logistic regression model , 2009, Expert Syst. Appl..

[61]  Mark D. Reid,et al.  Composite Binary Losses , 2009, J. Mach. Learn. Res..

[62]  M. Ben-Akiva,et al.  EMPIRICAL TEST OF A CONSTRAINED CHOICE DISCRETE MODEL : MODE CHOICE IN SAO PAULO, BRAZIL , 1987 .

[63]  Nuno Vasconcelos,et al.  Variable margin losses for classifier design , 2010, NIPS.

[64]  W. Loh,et al.  LOTUS: An Algorithm for Building Accurate and Comprehensible Logistic Regression Trees , 2004 .

[65]  A. J. Feelders,et al.  Classification trees for problems with monotonicity constraints , 2002, SKDD.

[66]  Michael A. West,et al.  Bayesian CART: Prior Specification and Posterior Simulation , 2007 .

[67]  Donato Malerba,et al.  A Comparative Analysis of Methods for Pruning Decision Trees , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[68]  T. Hesterberg,et al.  Weighted Average Importance Sampling and Defensive Mixture Distributions , 1995 .

[69]  A. Tversky,et al.  Rational choice and the framing of decisions , 1990 .

[70]  Dennis E. Jennings Judging Inference Adequacy in Logistic Regression , 1986 .

[71]  Lior Rokach,et al.  Ensemble-based classifiers , 2010, Artificial Intelligence Review.

[72]  I. Goleț Symmetric and Asymmetric Binary Choice Models for Corporate Bankruptcy , 2014 .

[73]  P. Green Reversible jump Markov chain Monte Carlo computation and Bayesian model determination , 1995 .

[74]  Joan L. Walker,et al.  Preference endogeneity in discrete choice models , 2014 .

[75]  K. Train,et al.  A Control Function Approach to Endogeneity in Consumer Choice Models , 2010 .

[76]  S. Eguchi,et al.  An asymmetric logistic regression model for ecological data , 2016 .

[77]  Mark Steyvers,et al.  Choosing a Strictly Proper Scoring Rule , 2013, Decis. Anal..

[78]  Luke W. Miratrix,et al.  To Adjust or Not to Adjust? Sensitivity Analysis of M-Bias and Butterfly-Bias , 2014, 1408.0324.

[79]  Mark W. Schmidt,et al.  Modeling Discrete Interventional Data using Directed Cyclic Graphical Models , 2009, UAI.

[80]  Chandra R. Bhat,et al.  A Comprehensive Dwelling Unit Choice Model Accommodating Psychological Constructs within a Search Strategy for Consideration Set Formation , 2015 .

[81]  T. Santner,et al.  Orthogonalizing parametric link transformation families in binary regression analysis , 1992 .

[82]  Akshay Vij,et al.  Incorporating the influence of latent modal preferences on travel mode choice behavior , 2013 .

[83]  James J. Heckman,et al.  1. The Scientific Model of Causality , 2005 .

[84]  Joffre Swait,et al.  Choice set generation within the generalized extreme value family of discrete choice models , 2001 .

[85]  David A. Hensher,et al.  Embedding Decision Heuristics in Discrete Choice Models: A Review , 2012 .

[86]  Peter E. Rossi,et al.  Marketing models of consumer heterogeneity , 1998 .

[87]  Gaël Varoquaux,et al.  The NumPy Array: A Structure for Efficient Numerical Computation , 2011, Computing in Science & Engineering.

[88]  Santtu Tikka,et al.  Identifying Causal Effects with the R Package causaleffect , 2017, 1806.07161.

[89]  Florian Heiss,et al.  Discrete Choice Methods with Simulation , 2016 .

[90]  T. Evgeniou,et al.  Disjunctions of Conjunctions, Cognitive Simplicity, and Consideration Sets , 2010 .

[91]  M. J. Laan,et al.  Targeted Learning: Causal Inference for Observational and Experimental Data , 2011 .

[92]  Jeremy M. G. Taylor The Cost of Generalizing Logistic Regression , 1988 .

[93]  Felix Elwert,et al.  Graphical Causal Models , 2013 .

[94]  B. Morgan Extended Models for Quantal Response Data , 1988 .

[95]  Víctor M. Guerrero,et al.  Use of the Box-Cox transformation with binary response models , 1982 .

[96]  Ashutosh Kumar Singh,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2010 .

[97]  Moshe Ben-Akiva,et al.  STRUCTURE OF PASSENGER TRAVEL DEMAND MODELS , 1974 .

[98]  C. Manski Partial Identification of Probability Distributions , 2003 .

[99]  Tom Lodewyckx,et al.  Bayesian Versus Frequentist Inference , 2008 .

[100]  M. Bierlaire,et al.  Discrete choice models with multiplicative error terms , 2009 .

[101]  Scott A. Sisson,et al.  Transdimensional Markov Chains , 2005 .

[102]  Louis Wehenkel,et al.  A complete fuzzy decision tree technique , 2003, Fuzzy Sets Syst..

[103]  J. Pearl,et al.  EIGHT MYTHS ABOUT CAUSALITY AND STRUCTURAL EQUATION MODELS , 2013 .

[104]  A. Dawid,et al.  Statistical Causality from a Decision-Theoretic Perspective , 2014, 1405.2292.

[105]  Christopher Winship,et al.  Counterfactuals and Causal Inference: Methods and Principles for Social Research , 2007 .

[106]  Silvia Angela Osmetti,et al.  Modelling small and medium enterprise loan defaults as rare events: the generalized extreme value regression model , 2013 .

[107]  Joseph N. Wilson,et al.  Twenty Years of Mixture of Experts , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[108]  Peter Buhlmann,et al.  BOOSTING ALGORITHMS: REGULARIZATION, PREDICTION AND MODEL FITTING , 2007, 0804.2752.

[109]  J. Pearl Causal diagrams for empirical research , 1995 .

[110]  Susan Athey,et al.  Recursive partitioning for heterogeneous causal effects , 2015, Proceedings of the National Academy of Sciences.

[111]  W. Kamakura,et al.  Modeling Preference and Structural Heterogeneity in Consumer Choice , 1996 .

[112]  Lior Rokach,et al.  Top-down induction of decision trees classifiers - a survey , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[113]  Christophe Marsala,et al.  Rank discrimination measures for enforcing monotonicity in decision tree induction , 2015, Inf. Sci..

[114]  Joel H. Steckel,et al.  A Heterogeneous Conditional Logit Model of Choice , 1988 .

[115]  R. Prentice,et al.  A generalization of the probit and logit methods for dose response curves. , 1976, Biometrics.

[116]  E. Cascetta,et al.  Dominance among alternatives in random utility models , 2009 .

[117]  Xin Yan,et al.  Facilitating score and causal inference trees for large observational studies , 2012, J. Mach. Learn. Res..

[118]  Greg Marsden,et al.  Insights on disruptions as opportunities for transport policy change , 2013 .

[119]  Marco Valtorta,et al.  Pearl's Calculus of Intervention Is Complete , 2006, UAI.

[120]  Carlo Giacomo Prato,et al.  Closing the gap between behavior and models in route choice: The role of spatiotemporal constraints and latent traits in choice set formation , 2012 .

[121]  F. Yates THE USE OF TRANSFORMATIONS AND MAXIMUM LIKELIHOOD IN THE ANALYSIS OF QUANTAL EXPERIMIENTS INVOLVING TWO TREATMENTS , 1955 .

[122]  Vasant Honavar,et al.  Transportability from Multiple Environments with Limited Experiments , 2013, NIPS.

[123]  Gary E. Bolton,et al.  INCENTIVE-ALIGNED CONJOINT ANALYSIS , 2004 .

[124]  C. Manski The structure of random utility models , 1977 .

[125]  Philip L. H. Yu,et al.  Logit tree models for discrete choice data with application to advice-seeking preferences among Chinese Christians , 2016, Comput. Stat..

[126]  Thomas F. Golob,et al.  Structural Equation Modeling For Travel Behavior Research , 2001 .

[127]  Maya L Petersen,et al.  Compound treatments, transportability, and the structural causal model: the power and simplicity of causal graphs. , 2011, Epidemiology.

[128]  Elizabeth A Stuart,et al.  Improving propensity score weighting using machine learning , 2010, Statistics in medicine.

[129]  John W. Polak,et al.  Simplified probabilistic choice set formation models in a residential location choice context , 2013 .

[130]  Mijung Kim,et al.  Two-stage multinomial logit model , 2011, Expert Syst. Appl..

[131]  W. Vijverberg,et al.  Betit: A Family that Nests Probit and Logit , 2000, SSRN Electronic Journal.

[132]  A. Bronner,et al.  Decision styles in transport mode choice , 1982 .

[133]  C. Manski Daniel McFadden and the Econometric Analysis of Discrete Choice , 2001 .

[134]  Cynthia Rudin,et al.  Interpretable classifiers using rules and Bayesian analysis: Building a better stroke prediction model , 2015, ArXiv.

[135]  M. Fosgerau,et al.  Mode choice endogeneity in value of travel time estimation , 2010 .

[136]  Simon Shepherd,et al.  A review of system dynamics models applied in transportation , 2014 .

[137]  Q. Vuong Likelihood Ratio Tests for Model Selection and Non-Nested Hypotheses , 1989 .

[138]  Susan Athey,et al.  The State of Applied Econometrics - Causality and Policy Evaluation , 2016, 1607.00699.

[139]  Roger Koenker,et al.  Parametric links for binary choice models: A Fisherian-Bayesian colloquy , 2009 .

[140]  Shlomo Bekhor,et al.  Development and estimation of a semi-compensatory model with a flexible error structure , 2012 .

[141]  G. Casella,et al.  Penalized regression, standard errors, and Bayesian lassos , 2010 .

[142]  David A. Hensher,et al.  Contrasts of Relative Advantage Maximisation with Random Utility Maximisation and Regret Minimisation , 2015 .

[143]  R. Kohli,et al.  Representation and Inference of Lexicographic Preference Models and Their Variants , 2007 .

[144]  Pradeep K. Chintagunta,et al.  Semiparametric Estimation of Brand Choice Behavior , 2002 .

[145]  S A Goldsmith,et al.  NATIONAL BICYCLING AND WALKING STUDY. CASE STUDY NO. 1: REASONS WHY BICYCLING AND WALKING ARE AND ARE NOT BEING USED MORE EXTENSIVELY AS TRAVEL MODES , 1992 .

[146]  Joshua D. Angrist,et al.  Mostly Harmless Econometrics: An Empiricist's Companion , 2008 .

[147]  A. Gelman Iterative and Non-iterative Simulation Algorithms , 2006 .

[148]  Baibing Li The multinomial logit model revisited: A semi-parametric approach in discrete choice analysis , 2011 .

[149]  Caspar G. Chorus,et al.  Random Regret Minimization: An Overview of Model Properties and Empirical Evidence , 2012 .

[150]  Dennis H. Gensch,et al.  Targeting the Switchable Industrial Customer , 1984 .

[151]  Andreas Holzinger,et al.  Data Mining with Decision Trees: Theory and Applications , 2015, Online Inf. Rev..

[152]  Jonathan Nagler,et al.  Scobit: An Alternative Estimator to Logit and Probit , 1994 .

[153]  R. Olshen,et al.  Almost surely consistent nonparametric regression from recursive partitioning schemes , 1984 .

[154]  K. Hornik,et al.  Model-Based Recursive Partitioning , 2008 .

[155]  Moshe Ben-Akiva,et al.  Discrete Choice Analysis: Theory and Application to Travel Demand , 1985 .

[156]  C. Chorus Paving the way towards superstar destinations: Models of convex demand for quality , 2018 .

[157]  A. Raftery,et al.  Strictly Proper Scoring Rules, Prediction, and Estimation , 2007 .

[158]  Harry Timmermans,et al.  Cognitive Process Model of Individual Choice Behaviour Incorporating Principles of Bounded Rationality and Heterogeneous Decision Heuristics , 2010 .

[159]  D. Dey,et al.  A New Skewed Link Model for Dichotomous Quantal Response Data , 1999 .

[160]  John Eltinge,et al.  Building Consistent Regression Trees From Complex Sample Data , 2011 .

[161]  Ronald L. Rivest,et al.  Learning decision lists , 2004, Machine Learning.

[162]  R. Dawes SOCIAL SELECTION BASED ON MULTIDIMENSIONAL CRITERIA. , 1964, Journal of abnormal psychology.

[163]  Sunil Vadera,et al.  A survey of cost-sensitive decision tree induction algorithms , 2013, CSUR.

[164]  Patricia L. Mokhtarian,et al.  Viewpoint: Quantifying residential self-selection effects: A review of methods and findings from applications of propensity score and sample selection approaches , 2016 .

[165]  J. Horowitz Semiparametric and Nonparametric Methods in Econometrics , 2007 .

[166]  Alex Alves Freitas,et al.  A Survey of Evolutionary Algorithms for Decision-Tree Induction , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[167]  M. Chikaraishi,et al.  Analysis of Tourism Generation Incorporating the Influence of Constraints Based on a Scobit Model , 2012 .

[168]  K. Train,et al.  Joint mixed logit models of stated and revealed preferences for alternative-fuel vehicles , 1999, Controlling Automobile Air Pollution.

[169]  J. Berkson MINIMUM CHI-SQUARE, NOT MAXIMUM LIKELIHOOD! , 1980 .

[170]  S. Lemon,et al.  Classification and regression tree analysis in public health: Methodological review and comparison with logistic regression , 2003, Annals of behavioral medicine : a publication of the Society of Behavioral Medicine.

[171]  Min Ding An Incentive-Aligned Mechanism for Conjoint Analysis , 2007 .

[172]  Alan Manning,et al.  The Credibility Revolution in Empirical Economics: How Better Research Design is Taking the Con Out of Econometrics , 2010, SSRN Electronic Journal.

[173]  Vasant Honavar,et al.  m-Transportability: Transportability of a Causal Effect from Multiple Environments , 2013, AAAI.

[174]  Carly R. Knight,et al.  The Causal Implications of Mechanistic Thinking: Identification Using Directed Acyclic Graphs (DAGs) , 2013 .

[175]  Simon Kasif,et al.  A System for Induction of Oblique Decision Trees , 1994, J. Artif. Intell. Res..

[176]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[177]  Raja Sengupta,et al.  The San Francisco Travel Quality Study: tracking trials and tribulations of a transit taker , 2017 .

[178]  Parametric link modification of both tails in binary regression , 1994 .

[179]  D. McFadden Conditional logit analysis of qualitative choice behavior , 1972 .

[180]  Robert Tibshirani,et al.  An Introduction to the Bootstrap , 1994 .

[181]  Michael G.H. Bell,et al.  System dynamics applicability to transportation modeling , 1994 .

[182]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[183]  Yu-Chiun Chiou,et al.  Willingness-to-pay for a bus fare reform: A contingent valuation approach with multiple bound dichotomous choices , 2017 .

[184]  A. Tversky Elimination by aspects: A theory of choice. , 1972 .

[185]  C. Hennig,et al.  Some thoughts about the design of loss functions , 2007 .

[186]  Janette Sadik-Khan,et al.  Streetfight: Handbook for an Urban Revolution , 2016 .

[187]  Xiaogang Su,et al.  Tree‐based model checking for logistic regression , 2007, Statistics in medicine.

[188]  S. Lauritzen,et al.  Chain graph models and their causal interpretations , 2002 .

[189]  K. Train,et al.  Mixed Logit with Repeated Choices: Households' Choices of Appliance Efficiency Level , 1998, Review of Economics and Statistics.

[190]  L. Keele The Statistics of Causal Inference: A View from Political Methodology , 2015, Political Analysis.

[191]  Halbert White,et al.  Settable Systems: An Extension of Pearl's Causal Model with Optimization, Equilibrium, and Learning , 2009, J. Mach. Learn. Res..

[192]  A. Dawid The geometry of proper scoring rules , 2007 .

[193]  Denis Nekipelov,et al.  Demand Estimation with Machine Learning and Model Combination , 2015 .

[194]  F. Martínez,et al.  The constrained multinomial logit: A semi-compensatory choice model , 2009 .

[195]  Paul E. Green,et al.  Completely Unacceptable Levels in Conjoint Analysis: A Cautionary Note , 1988 .

[196]  Dan Steinberg,et al.  THE HYBRID CART-LOGIT MODEL IN CLASSIFICATION AND DATA MINING , 1998 .

[197]  J. Fox Temporal transferability of mode-destination choice models , 2015 .

[198]  Kosuke Imai,et al.  Causal Inference With General Treatment Regimes , 2004 .

[199]  Khandker Nurul Habib,et al.  Myopic choice or rational decision making? An investigation into mode choice preference structures in competitive modal arrangements in a multimodal urban area, the City of Toronto , 2016 .

[200]  Chandra R. Bhat,et al.  Modeling the choice continuum: an integrated model of residential location, auto ownership, bicycle ownership, and commute tour mode choice decisions , 2011 .

[201]  Gerhard Paass,et al.  Model Switching for Bayesian Classification Trees with Soft Splits , 1998, PKDD.

[202]  Chu-Ping C. Vijverberg,et al.  Pregibit: A Family of Discrete Choice Models , 2012, SSRN Electronic Journal.

[203]  Andrew Daly,et al.  Allowing for heterogeneous decision rules in discrete choice models: an approach and four case studies , 2011 .

[204]  Soft Classification Trees , 2012 .

[205]  B. McKenzie,et al.  Modes Less Traveled—Bicycling and Walking to Work in the United States: 2008–2012 , 2014 .

[206]  Andrew Gelman,et al.  Multilevel (Hierarchical) Modeling: What It Can and Cannot Do , 2006, Technometrics.

[207]  Salvatore Ruggieri,et al.  Enumerating Distinct Decision Trees , 2017, ICML.

[208]  Qinghua Hu,et al.  Rank Entropy-Based Decision Trees for Monotonic Classification , 2012, IEEE Transactions on Knowledge and Data Engineering.

[209]  Aydin Alptekinoglu,et al.  The Exponomial Choice Model: A New Alternative for Assortment and Price Optimization , 2015, Oper. Res..

[210]  M. Newton Approximate Bayesian-inference With the Weighted Likelihood Bootstrap , 1994 .

[211]  Andrew Gelman,et al.  Fitting Multilevel Models When Predictors and Group Effects Correlate , 2007 .

[212]  Moshe Ben-Akiva,et al.  Incorporating random constraints in discrete models of choice set generation , 1987 .

[213]  Isa Steinmann,et al.  Mastering 'Metrics: The Path from Cause to Effect , 2015 .

[214]  P. Holland Statistics and Causal Inference , 1985 .

[215]  J. R. Quinlan Probabilistic decision trees , 1990 .

[216]  P. Viswanath,et al.  Ensemble of randomized soft decision trees for robust classification , 2016 .

[217]  D. Pregibon Resistant fits for some commonly used logistic models with medical application. , 1982, Biometrics.

[218]  Dipak K. Dey,et al.  Generalized extreme value regression for binary response data: An application to B2B electronic payments system adoption , 2011, 1101.1373.

[219]  Maciej Liskiewicz,et al.  Robust causal inference using Directed Acyclic Graphs: the R package , 2018 .

[220]  A. Buja,et al.  Loss Functions for Binary Class Probability Estimation and Classification: Structure and Applications , 2005 .

[221]  Ta Theo Arentze,et al.  Parametric Action Decision Trees: Incorporating Continuous Attribute Variables Into Rule-Based Models of Discrete Choice , 2007 .

[222]  Alex A. T. Bui,et al.  Motivating the Additional Use of External Validity: Examining Transportability in a Model of Glioblastoma Multiforme , 2014, AMIA.

[223]  Stephen G. Walker,et al.  Bayesian inference with misspecified models , 2013 .

[224]  J. Marschak Binary Choice Constraints on Random Utility Indicators , 1959 .

[225]  J. Pearl,et al.  Causal diagrams for epidemiologic research. , 1999, Epidemiology.

[226]  Shlomo Bekhor,et al.  Two-Stage Model for Jointly Revealing Determinants of Noncompensatory Conjunctive Choice Set Formation and Compensatory Choice , 2009 .

[227]  Hjp Harry Timmermans,et al.  A learning-based transportation oriented simulation system , 2004 .

[228]  Ken McLeod Where We Ride: Analysis of Bicycle Commuting in American Cities , 2014 .

[229]  Werner Brög,et al.  Individualized Marketing: Implications for Transportation Demand Management , 1998 .

[230]  Ana M. Bianco,et al.  Robust Estimation in the Logistic Regression Model , 1996 .

[231]  Jacques Wainer,et al.  Comparison of 14 different families of classification algorithms on 115 binary datasets , 2016, ArXiv.

[232]  R. Olshen,et al.  Consistent nonparametric regression from recursive partitioning schemes , 1980 .

[233]  D. Pregibon Goodness of Link Tests for Generalized Linear Models , 1980 .

[234]  D. McFadden Econometric Models for Probabilistic Choice Among Products , 1980 .

[235]  Per Olov Lindberg,et al.  Extreme values, invariance and choice probabilities , 2011 .

[236]  Julien Mairal,et al.  Optimization with Sparsity-Inducing Penalties , 2011, Found. Trends Mach. Learn..

[237]  Mark J. van der Laan,et al.  Targeted Maximum Likelihood Based Causal Inference , 2010 .

[238]  L. Hansen Large Sample Properties of Generalized Method of Moments Estimators , 1982 .

[239]  William Young A NON-TRADEOFF DECISION MAKING MODEL OF RESIDENTIAL LOCATION CHOICE , 1982 .

[240]  Gareth O. Roberts,et al.  A General Framework for the Parametrization of Hierarchical Models , 2007, 0708.3797.

[241]  Stephen P. Ryan,et al.  Machine Learning Methods for Demand Estimation , 2015 .

[242]  D. McFadden Disaggregate Behavioral Travel Demand's RUM Side A 30-Year Retrospective , 2000 .

[243]  Elias Bareinboim,et al.  Transportability of Causal and Statistical Relations: A Formal Approach , 2011, 2011 IEEE 11th International Conference on Data Mining Workshops.

[244]  Ralph Buehler,et al.  Making Cycling Irresistible: Lessons from The Netherlands, Denmark and Germany , 2008 .

[245]  A. Jenkinson The frequency distribution of the annual maximum (or minimum) values of meteorological elements , 1955 .

[246]  Edward E. Leamer,et al.  Let's Take the Con Out of Econometrics , 1983 .

[247]  R. Lalonde Evaluating the Econometric Evaluations of Training Programs with Experimental Data , 1984 .

[248]  D. McFadden MEASUREMENT OF URBAN TRAVEL DEMAND , 1974 .

[249]  Shie Mannor,et al.  Sparse Algorithms Are Not Stable: A No-Free-Lunch Theorem , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[250]  Elias Bareinboim,et al.  External Validity: From Do-Calculus to Transportability Across Populations , 2014, Probabilistic and Causal Inference.

[251]  Senén Barro,et al.  Do we need hundreds of classifiers to solve real world classification problems? , 2014, J. Mach. Learn. Res..

[252]  Christopher Winship,et al.  Endogenous Selection Bias: The Problem of Conditioning on a Collider Variable. , 2014, Annual review of sociology.

[253]  Greg M. Allenby,et al.  A Choice Model with Conjunctive, Disjunctive, and Compensatory Screening Rules , 2004 .

[254]  Qinghua Hu,et al.  Multivariate decision trees with monotonicity constraints , 2016, Knowl. Based Syst..

[255]  Xinyu Cao,et al.  Examining the Impacts of Residential Self‐Selection on Travel Behaviour: A Focus on Empirical Findings , 2009 .

[256]  C. Bhat A heteroscedastic extreme value model of intercity travel mode choice , 1995 .

[257]  S. P. Pederson,et al.  On Robustness in the Logistic Regression Model , 1993 .

[258]  Raul Cano On The Bayesian Bootstrap , 1992 .

[259]  John K. Dagsvik,et al.  Invariance axioms and functional form restrictions in structural models , 2017, Math. Soc. Sci..

[260]  J. Pearl Causal inference in statistics: An overview , 2009 .

[261]  Wiktor L. Adamowicz,et al.  Modeling non-compensatory preferences in environmental valuation , 2015 .

[262]  Enrique Castillo,et al.  Closed form expressions for choice probabilities in the Weibull case , 2008 .

[263]  Carlos F. Daganzo,et al.  Multinomial Probit: The Theory and its Application to Demand Forecasting. , 1980 .

[264]  R. L. Winkler Evaluating probabilities: asymmetric scoring rules , 1994 .

[265]  David A. Freedman,et al.  Statistics and the Scientific Method , 1985 .

[266]  Jin Tian,et al.  On the Testable Implications of Causal Models with Hidden Variables , 2002, UAI.

[267]  Francisco Herrera,et al.  Monotonic Random Forest with an Ensemble Pruning Mechanism based on the Degree of Monotonicity , 2015, New Generation Computing.

[268]  Christopher M. Bishop,et al.  Bayesian Hierarchical Mixtures of Experts , 2002, UAI.

[269]  J.-S.R. Jang,et al.  Structure determination in fuzzy modeling: a fuzzy CART approach , 1994, Proceedings of 1994 IEEE 3rd International Fuzzy Systems Conference.

[270]  L. Breitling dagR: a suite of R functions for directed acyclic graphs. , 2010, Epidemiology.

[271]  Zabe Bent San Francisco Mobility, Access, and Pricing Study , 2010 .

[272]  C. Scott Calibrated asymmetric surrogate losses , 2012 .

[273]  S. Bhattacharya,et al.  Transdimensional transformation based Markov chain Monte Carlo , 2014, Brazilian Journal of Probability and Statistics.

[274]  Wei-Yin Loh,et al.  Fifty Years of Classification and Regression Trees , 2014 .

[275]  J. Pearl,et al.  An Axiomatic Characterization of Causal Counterfactuals , 1998 .

[276]  D. Dey,et al.  Flexible generalized t-link models for binary response data , 2008 .

[277]  John M. Rose,et al.  Can you ever be certain? Reducing hypothetical bias in stated choice experiments via respondent reported choice certainty , 2016 .

[278]  P. Green,et al.  Reversible jump MCMC , 2009 .

[279]  W. Greene,et al.  Recent Progress on Endogeneity in Choice Modeling , 2005 .

[280]  Francisco J. Aranda-Ordaz,et al.  On Two Families of Transformations to Additivity for Binary Response Data , 1981 .

[281]  A. Zeileis,et al.  Gaining insight with recursive partitioning of generalized linear models , 2013 .

[282]  Christian Borgelt,et al.  An implementation of the FP-growth algorithm , 2005 .

[283]  S. Mukhopadhyay,et al.  On generalized multinomial models and joint percentile estimation , 2012, 1211.6915.

[284]  Peter Boatwright,et al.  A Satisficing Choice Model , 2012, Mark. Sci..

[285]  G Gigerenzer,et al.  Reasoning the fast and frugal way: models of bounded rationality. , 1996, Psychological review.

[286]  Yudi Pawitan,et al.  A Reminder of the Fallibility of the Wald Statistic: Likelihood Explanation , 2000 .

[287]  J. Swait,et al.  Probabilistic choice set generation in transportation demand models , 1984 .

[288]  W. Härdle,et al.  Semiparametric Single Index Versus Fixed Link Function Modelling , 1997 .

[289]  Junyi Zhang,et al.  Developing an integrated scobit-based activity participation and time allocation model to explore influence of childcare on women’s time use behaviour , 2012 .

[290]  Joffre Swait,et al.  Context Dependence and Aggregation in Disaggregate Choice Analysis , 2002 .

[291]  John Mingers,et al.  An Empirical Comparison of Pruning Methods for Decision Tree Induction , 1989, Machine Learning.

[292]  Clyde H. Coombs Mathematical Models in Psychological Scaling , 1951 .