Mining open datasets for transparency in taxi transport in metropolitan environments

Uber has recently been introducing novel practices in urban taxi transport. Journey prices can change dynamically in almost real time and also vary geographically from one area to another in a city, a strategy known as surge pricing. In this paper, we explore the power of the new generation of open datasets towards understanding the impact of the new disruption technologies that emerge in the area of public transport. With our primary goal being a more transparent economic landscape for urban commuters, we provide a direct price comparison between Uber and the Yellow Cab company in New York. We discover that Uber, despite its lower standard pricing rates, effectively charges higher fares on average, especially during short in length, but frequent in occurrence, taxi journeys. Building on this insight, we develop a smartphone application, OpenStreetCab, that offers a personalized consultation to mobile users on which taxi provider is cheaper for their journey. Almost five months after its launch, the app has attracted more than three thousand users in a single city. Their journey queries have provided additional insights on the potential savings similar technologies can have for urban commuters, with a highlight being that on average, a user in New York saves 6 U.S. Dollars per taxi journey if they pick the cheapest taxi provider. We run extensive experiments to show how Uber’s surge pricing is the driving factor of higher journey prices and therefore higher potential savings for our application’s users. Finally, motivated by the observation that Uber’s surge pricing is occurring more frequently that intuitively expected, we formulate a prediction task where the aim becomes to predict a geographic area’s tendency to surge. Using exogenous to Uber data, in particular Yellow Cab and Foursquare data, we show how it is possible to estimate customer demand within an area, and by extension surge pricing, with high accuracy.

[1]  Licia Capra,et al.  How smart is your smartcard?: measuring travel behaviours, perceptions, and incentives , 2011, UbiComp '11.

[2]  Vania Bogorny,et al.  Spatial and Spatio-temporal Data Mining , 2008, 2010 IEEE International Conference on Data Mining.

[3]  Cecilia Mascolo,et al.  Exploiting Foursquare and Cellular Data to Infer User Activity in Urban Environments , 2013, 2013 IEEE 14th International Conference on Mobile Data Management.

[4]  Víctor Soto,et al.  Automated land use identification using cell-phone records , 2011, HotPlanet '11.

[5]  Cecilia Mascolo,et al.  Geo-spotting: mining online location-based services for optimal retail store placement , 2013, KDD.

[6]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[7]  Yu Zheng,et al.  Computing with Spatial Trajectories , 2011, Computing with Spatial Trajectories.

[8]  Paolo Santi,et al.  Supporting Information for Quantifying the Benefits of Vehicle Pooling with Shareability Networks Data Set and Pre-processing , 2022 .

[9]  Dino Pedreschi,et al.  Trajectory pattern mining , 2007, KDD '07.

[10]  Nicholas Diakopoulos,et al.  Algorithmic Accountability , 2015 .

[11]  Xing Xie,et al.  T-drive: driving directions based on taxi trajectories , 2010, GIS '10.

[12]  K. Mellanby How Nature works , 1978, Nature.

[13]  Trevor S. Hale,et al.  Location Science Research: A Review , 2003, Ann. Oper. Res..

[14]  Xing Xie,et al.  Urban computing with taxicabs , 2011, UbiComp '11.

[15]  Rossano Schifanella,et al.  The shortest path to happiness: recommending beautiful, quiet, and happy routes in the city , 2014, HT.

[16]  Albert-László Barabási,et al.  Understanding individual human mobility patterns , 2008, Nature.

[17]  T. Geisel,et al.  The scaling laws of human travel , 2006, Nature.

[18]  John R. Beaumont,et al.  Spatial Economics: Density, Potential and Flow , 1986 .

[19]  Licia Capra,et al.  Mining mobility data to minimise travellers' spending on public transport , 2011, KDD.

[20]  Michael Batty,et al.  Mining bicycle sharing data for generating insights into sustainable transport systems , 2014 .

[21]  Jon E. Froehlich,et al.  Measuring the Pulse of the City through Shared Bicycle Programs , 2008 .

[22]  T. Raa Spatial economics: density, potential, and flow: Martin BECKMANN and Tönu PUU Volume 14 in: Studies in Regional Science and Urban Economics, North-Holland, Amsterdam, 1985, xii + 276 pages, Dfl. 125.00 , 1986 .

[23]  John F. Roddick,et al.  Temporal, Spatial, and Spatio-Temporal Data Mining , 2001, Lecture Notes in Computer Science.

[24]  J. R. Quinlan Learning With Continuous Classes , 1992 .

[25]  Carlo Ratti,et al.  Cities through the Prism of People’s Spending Behavior , 2015, PloS one.

[26]  Licia Capra,et al.  Measuring the impact of opening the London shared bicycle scheme to casual users , 2012 .

[27]  Robert C. Hampshire,et al.  Inventory rebalancing and vehicle routing in bike sharing systems , 2017, Eur. J. Oper. Res..