Optimizing copious activity type classes based on classification accuracy and entropy retention

Despite the advantages, big transport data are characterized by a considerable disadvantage as well. Personal and activity-travel information are often lacking, making it necessary to deduce this information with data mining techniques. However, some studies predict many unique activity type classes (ATCs), while others merge multiple activity types into larger ATCs. This action enhances the activity inference estimation, but destroys important activity information. Previous studies do not provide a strong justification for this practice. An objectively optimized set of ATCs, balancing model prediction accuracy and preserving activity information from the original data, becomes essential. Previous research developed a classification methodology in which the optimal set of ATCs was identified by analyzing all possible ATC combinations. However, this approach is practically impossible in a finite amount of time for e.g. the US National Household Travel Survey (NHTS) 2009 data set, which comprises 36 ATCs (home activity excluded), since there would be 3.82•1030 unique combinations (an exponential increase). The aim of this paper is to optimize which original ATCs should be grouped into a new class, and this for data sets for which it is impossible or impractical to simply calculate all ATC combinations. The proposed method defines an optimization parameter U (based on classification accuracy and information retention) which is maximized in an iterative local search algorithm. The optimal set of ATCs for the NHTS 2009 data set was determined. A comparison finds that this optimum is considerably better than many expert opinion activity type classification systems. Convergence was confirmed and large performance gains were found.

[1]  Stefan Schönfelder,et al.  Eighty Weeks of Global Positioning System Traces: Approaches to Enriching Trip Information , 2004 .

[2]  Mark Hickman,et al.  Trip purpose inference using automated fare collection data , 2014, Public Transp..

[3]  Satish V. Ukkusuri,et al.  Urban activity pattern classification using topic models from online geo-location data , 2014 .

[4]  M. Bradley,et al.  A model for joint choice of daily activity pattern types of household members , 2005 .

[5]  Abolfazl Mohammadian,et al.  The validity of using activity type to structure tour-based scheduling models , 2007 .

[6]  Michael D Meyer Transportation Planning Handbook , 2016 .

[7]  Lei Zhang,et al.  Imputing trip purposes for long-distance travel , 2015 .

[8]  Peter R. Stopher,et al.  A process for trip purpose imputation from Global Positioning System data , 2013 .

[9]  Yasuo Asakura,et al.  Behavioural data mining of transit smart card data: A data fusion approach , 2014 .

[10]  Kay W. Axhausen,et al.  Mobidrive: A six week travel diary , 2004 .

[11]  Davy Janssens,et al.  Developing an optimised activity type annotation method based on classification accuracy and entropy indices , 2017 .

[12]  Will Recker,et al.  Mining activity pattern trajectories and allocating activities in the network , 2015 .

[13]  Michael Löchl Stability of Travel behaviour: Thurgau 2003 , 2005 .

[14]  Michael Batty,et al.  Inferring building functions from a probabilistic model using public transportation data , 2014, Comput. Environ. Urban Syst..

[15]  R. Kitchin,et al.  Big data and human geography , 2013 .

[16]  Keechoo Choi,et al.  Analyzing changes in travel behavior in time and space using household travel surveys in Seoul Metropolitan Area over eight years , 2014 .

[17]  Shanjiang Zhu,et al.  Imputing Trip Purpose Based on GPS Travel Survey Data and Machine Learning Methods , 2013 .

[18]  H. Timmermans,et al.  Detecting Spatial and Temporal Route Information of GPS Traces , 2015 .

[19]  Hjp Harry Timmermans,et al.  Detecting activity type from GPS traces using spatial and temporal information , 2015 .

[20]  Kay W. Axhausen,et al.  Trip Purpose Identification from GPS Tracks , 2014 .

[21]  Randall Guensler,et al.  Elimination of the Travel Diary: Experiment to Derive Trip Purpose from Global Positioning System Travel Data , 2001 .

[22]  Peter R. Stopher,et al.  Search for a global positioning system device to measure person travel , 2008 .

[23]  Kees Maat,et al.  Deriving and validating trip purposes and travel modes for multi-day GPS-based travel surveys: A large-scale application in the Netherlands , 2009 .

[24]  Davy Janssens,et al.  Semantic Annotation of Global Positioning System Traces , 2013 .

[25]  Davy Janssens,et al.  The Annotation of Global Positioning System (GPS) Data with Activity Purposes Using Multiple Machine Learning Algorithms , 2014 .

[26]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.