Data selection in machine learning for identifying trip purposes and travel modes from longitudinal GPS data collection lasting for seasons

Abstract Application of machine learning methods shows a popular attempt to identify the purpose of a trip and mode of travel on Global Positioning System (GPS) trajectory data. Data selection for the training and test sets is important in these methods. However, the feasibility and effects of choosing these data from different periods of the year are still unknown. This detail is particularly important since collecting data via GPS decreases the burden on participants to such an extent that it can last for seasons which may own distinct features. In order to bridge this gap, this paper employs Aslan & Zech’s test (AZ-test) and Random Forests (RF) successively to investigate the influence of data selection from different seasons for training and test sets. The dataset obtained in a city with distinct seasons, Hakodate, Japan, is used for our empirical analysis. The results of AZ-test suggest that explanatory variables of the two data sets from distinct seasons follow different distributions. Furthermore, it concludes that data set from two-seasons and data set from single season also follow different distributions. However, this test achieves some contradictory results in some cases. Due to this, RF is used to check how the accuracy varies in a further detail. RF confirms the findings by AZ-test in most cases. In addition, RF results show that including GIS features as explanatory variables has positive effect on the identification accuracy while including weather features has negative effect on the identification accuracy.

[1]  Anders Karlström,et al.  The influence of weather characteristics variability on individual’s travel mode choice in different seasons and regions in Sweden , 2015 .

[2]  Eiji Hato,et al.  Use of acceleration data for transportation mode prediction , 2015 .

[3]  Geert Wets,et al.  Changes in Travel Behavior in Response to Weather Conditions , 2010 .

[4]  C. Jakob,et al.  Quantifying and comparing the effects of weather on bicycle demand in Melbourne (Australia) and Portland (USA) , 2011 .

[5]  K. Keay,et al.  The association of rainfall and other weather variables with road traffic volume in Melbourne, Australia. , 2005, Accident; analysis and prevention.

[6]  Kees Maat,et al.  Deriving and validating trip purposes and travel modes for multi-day GPS-based travel surveys: A large-scale application in the Netherlands , 2009 .

[7]  Constantin F. Aliferis,et al.  A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification , 2008, BMC Bioinformatics.

[8]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[9]  N. Wilson,et al.  Impact of Weather on Transit Ridership in Chicago, Illinois , 2007 .

[10]  M. Dijst,et al.  Impact of Everyday Weather on Individual Daily Travel Behaviours in Perspective: A Literature Review , 2013 .

[11]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[12]  Hjp Harry Timmermans,et al.  Comparison of advanced imputation algorithms for detection of transportation mode and activity episode using GPS data , 2016 .

[13]  Xing Xie,et al.  Learning transportation mode from raw gps data for geographic applications on the web , 2008, WWW.

[14]  Anders Karlström,et al.  Measuring the impacts of weather variability on home-based trip chaining behaviour: a focus on spatial heterogeneity , 2016 .

[15]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[16]  Miao Li-xin Offline Map-matching for Archived Probe Vehicle Data , 2013 .

[17]  Toshiyuki Yamamoto,et al.  Identification of activity stop locations in GPS trajectories by density-based clustering method combined with support vector machines , 2015 .

[18]  G. Zech,et al.  Statistical energy as a tool for binning-free, multivariate goodness-of-fit tests, two-sample comparison and unfolding , 2005 .

[19]  Yusak O. Susilo,et al.  Examining the impact of weather variability on non-commuters’ daily activity–travel patterns in different regions of Sweden , 2014 .

[20]  Toshiyuki Yamamoto,et al.  Comparison of Activity Type Identification from Mobile Phone GPS Data Using Various Machine Learning Methods , 2016 .