Mining Connected Vehicle Data for Beneficial Patterns in Dubai Taxi Operations

On-demand shared mobility services such as Uber and microtransit are steadily penetrating the worldwide market for traditional dispatched taxi services. Hence, taxi companies are seeking ways to compete. This study mined large-scale mobility data from connected taxis to discover beneficial patterns that may inform strategies to improve dispatch taxi business. It is not practical to manually clean and filter large-scale mobility data that contains GPS information. Therefore, this research contributes and demonstrates an automated method of data cleaning and filtering that is suitable for such types of datasets. The cleaning method defines three filter variables and applies a layered statistical filtering technique to eliminate outlier records that do not contribute to distributions that match expected theoretical distributions of the variables. Chi-squared statistical tests evaluate the quality of the cleaned data by comparing the distribution of the three variables with their expected distributions. The overall cleaning method removed approximately 5% of the data, which consisted of errors that were obvious and others that were poor quality outliers. Subsequently, mining the cleaned data revealed that trip production in Dubai peaks for the case when only the same two drivers operate the same taxi. This finding would not have been possible without access to proprietary data that contains unique identifiers for both drivers and taxis. Datasets that identify individual drivers are not publicly available.

[1]  Jaume Barceló,et al.  Traffic Data Collection and Its Standardization , 2010 .

[2]  Christoph M. Flath,et al.  The Economics of Multi-Hop Ride Sharing , 2015, Bus. Inf. Syst. Eng..

[3]  Eric J. Gonzales,et al.  Modeling Taxi Demand and Supply in New York City Using Large-Scale Taxi GPS Data , 2017 .

[4]  Yusen Chen,et al.  Using Probe Vehicle Data for Traffic State Estimation in Signalized Urban Networks , 2010 .

[5]  Kai Liu,et al.  An Analysis of the Cost Efficiency of Probe Vehicle Data at Different Transmission Frequencies , 2006 .

[7]  Stefan Voß,et al.  On the Value and Challenge of Real-Time Information in Dynamic Dispatching of Service Vehicles , 2017, Bus. Inf. Syst. Eng..

[8]  Shi An,et al.  Taxi Driver’s Operation Behavior and Passengers’ Demand Analysis Based on GPS Data , 2018 .

[9]  Toshiyuki Yamamoto,et al.  Development of map matching algorithm for low frequency probe data , 2012 .

[10]  Alejandro Tirachini,et al.  Estimation of travel time variability for cars, buses, metro and door-to-door public transport trips in Santiago, Chile , 2016 .

[11]  Jin Liu,et al.  A cloud‐based taxi trace mining framework for smart city , 2017, Softw. Pract. Exp..

[12]  Hsing-Chung Chu An empirical study to determine freight travel time at a major port , 2011 .

[13]  Tao Zhang,et al.  A Study on the Method for Cleaning and Repairing the Probe Vehicle Data , 2013, IEEE Transactions on Intelligent Transportation Systems.

[14]  Brian Peacock,et al.  Statistical Distributions: Forbes/Statistical Distributions 4E , 2010 .

[15]  W. Marsden I and J , 2012 .

[16]  Philip E. Gill,et al.  Practical optimization , 1981 .

[17]  T. Vincenty DIRECT AND INVERSE SOLUTIONS OF GEODESICS ON THE ELLIPSOID WITH APPLICATION OF NESTED EQUATIONS , 1975 .

[18]  A. Karr Exploratory Data Mining and Data Cleaning , 2006 .

[19]  Y. Nie How can the taxi industry survive the tide of ridesourcing? Evidence from Shenzhen, China , 2017 .

[20]  Lei Wang,et al.  Shadow matching: Improved GNSS accuracy in Urban canyons , 2012 .