Household travel mode choice estimation with large-scale data—an empirical analysis based on mobility data in Milan

Abstract Data analysis plays a key role in supporting the development of sustainable transportation. Using the large-scale household mobility survey data collected in Milan, Italy during 2005–2006, we study whether the large-scale data contribute to improving accuracy in estimating household travel modes. This paper presents three machine learning methods including multinomial logit (MNL) model, random forest (RF) and support vector machine (SVM) to estimate the household travel mode. Their model accuracies are 70.41%, 71.89%, 72.74% respectively under the full sample size. It is found that the accuracies of these three methods fluctuate fiercely when the sample size is less than 20,000 and then stabilize gradually with continuous increasing it. After stabilization occurs, accuracies with these three methods do not significantly increase as the sample size continues to increase. We also study the travel characteristics derived from the large-scale survey data, which is fundamental for developing a sustainable transportation system. The collected data items include five explanatory variables, i.e., household size (HS), vehicle ownership, household income (HI), travel distance, travel time and one response variable (i.e., household travel mode), which includes public transport (PT), private car, usage of PT and private car simultaneously and the others travel modes (e.g., walk). We further investigate the importance of explanatory variables in terms of estimating household travel mode choice with the MNL model. It is found that vehicle ownership is the most critical factor influencing household travel mode choice, followed by travel distance, travel time, HS and HI. The ranking result is consistent with the RF approach.

[1]  L. Frank,et al.  Urban form, travel time, and cost relationships with tour complexity and mode choice , 2007 .

[2]  Ken Kelley,et al.  Sample size planning for the coefficient of variation from the accuracy in parameter estimation approach , 2007, Behavior research methods.

[3]  Richard G Baraniuk,et al.  More Is Less: Signal Processing and the Data Deluge , 2011, Science.

[4]  Ying-Ming Wang,et al.  A comparison of neural network, evidential reasoning and multiple regression analysis in modelling bridge risks , 2007, Expert Syst. Appl..

[5]  Chandra R. Bhat,et al.  Incorporating Observed and Unobserved Heterogeneity in Urban Work Travel Mode Choice Modeling , 2000, Transp. Sci..

[6]  David A. Hensher,et al.  The Mixed Logit Model: the State of Practice and Warnings for the Unwary , 2001 .

[7]  J. Berkson Application of the Logistic Function to Bio-Assay , 1944 .

[8]  Chi Xie,et al.  WORK TRAVEL MODE CHOICE MODELING USING DATA MINING: DECISION TREES AND NEURAL NETWORKS , 2002 .

[9]  Davy Janssens,et al.  Annotating mobile phone location data with activity purposes using machine learning algorithms , 2013, Expert Syst. Appl..

[10]  S. Fujii,et al.  Exploring the relationship between undergraduate education and sustainable transport attitudes , 2016 .

[11]  Hjp Harry Timmermans,et al.  Using ensembles of decision trees to predict transport mode choice decisions: effects on predictive success and uncertainty estimates , 2014 .

[12]  Qixun Zhang,et al.  Proactive Radio Resource Optimization With Margin Prediction: A Data Mining Approach , 2017, IEEE Transactions on Vehicular Technology.

[13]  C. Bhat Quasi-random maximum simulated likelihood estimation of the mixed multinomial logit model , 2001 .

[14]  Andrea De Mauro,et al.  A formal definition of Big Data based on its essential features , 2016 .

[15]  Karthik K. Srinivasan,et al.  A Dynamic Kernel Logit Model for the Analysis of Longitudinal Discrete Choice Data: Properties and Computational Assessment , 2005, Transp. Sci..

[16]  Eiji Hato,et al.  Use of acceleration data for transportation mode prediction , 2014, Transportation.

[17]  Julian Hagenauer,et al.  A comparative study of machine learning classifiers for modeling travel mode choice , 2017, Expert Syst. Appl..

[18]  Hichem Omrani,et al.  Predicting Travel Mode of Individuals by Machine Learning , 2015 .

[19]  P. Chillón,et al.  Data mining techniques to analyze the factors influencing active commuting to school , 2020, International Journal of Sustainable Transportation.

[20]  Mahmoud Mesbah,et al.  Spatial-temporal similarity correlation between public transit passengers using smart card data , 2017 .

[21]  Steve Pye,et al.  Modelling sustainable urban travel in a whole systems energy model , 2015 .

[22]  Pavlos S. Kanaroglou,et al.  Disaggregate Demand Analyses for Conventional and Alternative Fueled Automobiles: A Review , 2008 .

[23]  Li Guo,et al.  Comparative study between incremental and ensemble learning on data streams: Case study , 2014, Journal Of Big Data.

[24]  Toshiyuki Yamamoto,et al.  Drivers’ Route Choice Behavior: Analysis by Data Mining Algorithms , 2002 .

[25]  Yunlong Zhang,et al.  Travel Mode Choice Modeling with Support Vector Machines , 2008 .

[26]  Fei-Yue Wang,et al.  Data-Driven Intelligent Transportation Systems: A Survey , 2011, IEEE Transactions on Intelligent Transportation Systems.

[27]  Füsun F. Gönül,et al.  Modeling Multiple Sources of Heterogeneity in Multinomial Logit Models: Methodological and Managerial Issues , 1993 .

[28]  Moshe Ben-Akiva,et al.  Exploratory Analysis of a Smartphone-Based Travel Survey in Singapore , 2015 .

[29]  Hai Yang,et al.  Exploration of route choice behavior with advanced traveler information using neural network concepts , 1993 .

[30]  Shian-Shyong Tseng,et al.  Discovering Traffic Bottlenecks in an Urban Network by Spatiotemporal Data Mining on Location-Based Services , 2011, IEEE Transactions on Intelligent Transportation Systems.

[31]  Richard Taylor Interpretation of the Correlation Coefficient: A Basic Review , 1990 .

[32]  Moshe Ben-Akiva,et al.  Future Mobility Survey , 2013 .

[33]  J. Scheiner,et al.  Travel mode choice: affected by objective or subjective determinants? , 2007 .