The usefulness of the Sequence Alignment Methods in validating rule-based activity-based forecasting models

This research paper aims at achieving a better understanding of rule-based activity-based models, by proposing a new level of validation at the process model level in the A Learning-based Transportation Oriented Simulation System (ALBATROSS) model. To that effect, the work activity process model, which includes six decision steps, has been investigated. Each decision step is evaluated during the prediction of the individuals’ schedules. There are specific decision steps that affect the execution pattern of the work activity process model. So, the comportment of execution in the process model contains activation dependency. This branches the execution and evaluation of each agent under examination. Sequence Alignment Methods (SAM) can be used to evaluate how similar/dissimilar the predicted and observed decision sequences are on an agent level. The original Chi-squared Automatic Interaction Detector decision trees at each decision step utilized in ALBATROSS are compared with other well known induction methods chosen to appraise the purpose of the analyses. The models are validated at four levels: the classifier or decision step level whereby confusion matrix statistics are used; The work activity trips Origin–Destination matrix level; the time of day work activity start time level, using a correlation coefficient; and the process model level, using SAM. The results of validation on the proposed process model level show conformity to all validation levels. In addition, the results provide additional information in better understanding the process model’s behavior. Hence, introducing a new level of validation incur new knowledge and assess the predictive performance of rule-based activity-based models. And assist in identifying critical decision steps in the work activity process model.

[1]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[2]  Davy Janssens,et al.  Integrating Bayesian networks and decision trees in a sequential rule-based transportation model , 2006, Eur. J. Oper. Res..

[3]  Davy Janssens,et al.  Implementation Framework and Development Trajectory of FEATHERS Activity-Based Simulation Platform , 2010 .

[4]  G. Brier VERIFICATION OF FORECASTS EXPRESSED IN TERMS OF PROBABILITY , 1950 .

[5]  Ta Theo Arentze,et al.  Pattern Recognition in Complex Activity Travel Patterns: Comparison of Euclidean Distance, Signal-Processing Theoretical, and Multidimensional Sequence Alignment Methods , 2001 .

[6]  G. V. Kass An Exploratory Technique for Investigating Large Quantities of Categorical Data , 1980 .

[7]  Ta Theo Arentze,et al.  Modeling Car Allocation Decisions in Automobile Deficient Households , 2007 .

[8]  Wei-Yin Loh,et al.  A Comparison of Prediction Accuracy, Complexity, and Training Time of Thirty-Three Old and New Classification Algorithms , 2000, Machine Learning.

[9]  JOHANNES GEHRKE,et al.  RainForest—A Framework for Fast Decision Tree Construction of Large Datasets , 1998, Data Mining and Knowledge Discovery.

[10]  Geert Wets,et al.  Identifying Decision Structures Underlying Activity Patterns: An Exploration of Data Mining Algorithms , 2000 .

[11]  Kyuseok Shim,et al.  Efficient algorithms for constructing decision trees with constraints , 2000, KDD '00.

[12]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[13]  Ta Theo Arentze,et al.  Experiences with developing ALBATROSS: a learning-based transportation oriented simulation system , 1998 .

[14]  Geert Wets,et al.  Association Rules in Identification of Spatial-Temporal Patterns in Multiday Activity Diary Data , 2001 .

[15]  D. Cox The Regression Analysis of Binary Sequences , 1958 .

[16]  Elke Moons Modelling activity-diary data: complexity or parsimony , 2005 .

[17]  Hjp Harry Timmermans,et al.  A learning-based transportation oriented simulation system , 2004 .

[18]  J. R. Quilan Decision trees and multi-valued attributes , 1988 .

[19]  Harry J. P. Timmermans,et al.  Measuring the goodness-of-fit of decision-tree models of discrete and continuous activity-travel choice: methods and empirical illustration , 2003, J. Geogr. Syst..

[20]  Ta Theo Arentze,et al.  Activity pattern similarity : a multidimensional sequence alignment method , 2002 .

[21]  Theo Arentze,et al.  Measuring Impacts of Condition Variables in Rule-Based Models of Space-Time Choice Behavior: Method and Empirical Illustration , 2002 .

[22]  Gary King,et al.  Logistic Regression in Rare Events Data , 2001, Political Analysis.

[23]  Graham J. Williams,et al.  Rattle: A Data Mining GUI for R , 2009, R J..

[24]  Davy Janssens,et al.  Improving Performance of Multiagent Rule-Based Model for Activity Pattern Decisions with Bayesian Networks , 2004 .

[25]  Wen-Ching Lin,et al.  PMML in Action: Unleashing the Power of Open Standards for Data Mining and Predictive Analytics , 2010 .

[26]  W C Wilson,et al.  Activity Pattern Analysis by Means of Sequence-Alignment Methods , 1998 .

[27]  Robert C. Holte,et al.  Very Simple Classification Rules Perform Well on Most Commonly Used Datasets , 1993, Machine Learning.