The Simpler The Better: A Unified Approach to Predicting Original Taxi Demands based on Large-Scale Online Platforms

Taxi-calling apps are gaining increasing popularity for their efficiency in dispatching idle taxis to passengers in need. To precisely balance the supply and the demand of taxis, online taxicab platforms need to predict the Unit Original Taxi Demand (UOTD), which refers to the number of taxi-calling requirements submitted per unit time (e.g., every hour) and per unit region (e.g., each POI). Predicting UOTD is non-trivial for large-scale industrial online taxicab platforms because both accuracy and flexibility are essential. Complex non-linear models such as GBRT and deep learning are generally accurate, yet require labor-intensive model redesign after scenario changes (e.g., extra constraints due to new regulations). To accurately predict UOTD while remaining flexible to scenario changes, we propose LinUOTD, a unified linear regression model with more than 200 million dimensions of features. The simple model structure eliminates the need of repeated model redesign, while the high-dimensional features contribute to accurate UOTD prediction. We further design a series of optimization techniques for efficient model training and updating. Evaluations on two large-scale datasets from an industrial online taxicab platform verify that LinUOTD outperforms popular non-linear models in accuracy. We envision our experiences to adopt simple linear models with high-dimensional features in UOTD prediction as a pilot study and can shed insights upon other industrial large-scale spatio-temporal prediction problems.

[1]  Licia Capra,et al.  Urban Computing: Concepts, Methodologies, and Applications , 2014, TIST.

[2]  Xing Xie,et al.  T-drive: driving directions based on taxi trajectories , 2010, GIS '10.

[3]  Daniela Rus,et al.  ChangiNOW: A mobile application for efficient taxi allocation at airports , 2013, 16th International IEEE Conference on Intelligent Transportation Systems (ITSC 2013).

[4]  Zhaohui Wu,et al.  Prediction of urban human mobility using large-scale taxi traces and its applications , 2012, Frontiers of Computer Science.

[5]  Guangzhong Sun,et al.  Driving with knowledge from the physical world , 2011, KDD.

[6]  Alexander J. Smola,et al.  Scaling Distributed Machine Learning with the Parameter Server , 2014, OSDI.

[7]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[8]  Lin Xiao,et al.  Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization , 2009, J. Mach. Learn. Res..

[9]  Thomas M. Cover,et al.  Elements of Information Theory: Cover/Elements of Information Theory, Second Edition , 2005 .

[10]  Seunghak Lee,et al.  More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server , 2013, NIPS.

[11]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[12]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[13]  João Gama,et al.  Predicting Taxi–Passenger Demand Using Streaming Data , 2013, IEEE Transactions on Intelligent Transportation Systems.

[14]  Naoto Mukai,et al.  Taxi Demand Forecasting Based on Taxi Probe Data by Neural Network , 2012, IIMSS.

[15]  Kai Zhang,et al.  A Framework for Passengers Demand Prediction and Recommendation , 2016, 2016 IEEE International Conference on Services Computing (SCC).

[16]  Lei Chen,et al.  Online mobile Micro-Task Allocation in spatial crowdsourcing , 2016, 2016 IEEE 32nd International Conference on Data Engineering (ICDE).

[17]  Xing Xie,et al.  Where to find my next passenger , 2011, UbiComp '11.

[18]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[19]  Joaquin Quiñonero Candela,et al.  Practical Lessons from Predicting Clicks on Ads at Facebook , 2014, ADKDD'14.

[20]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[21]  Martin Wattenberg,et al.  Ad click prediction: a view from the trenches , 2013, KDD.

[22]  Kai Zhao,et al.  Predicting taxi demand at high spatial resolution: Approaching the limit of predictability , 2016, 2016 IEEE International Conference on Big Data (Big Data).

[23]  Jimeng Sun,et al.  Querying about the past, the present, and the future in spatio-temporal databases , 2004, Proceedings. 20th International Conference on Data Engineering.

[24]  Lei Chen,et al.  Online Minimum Matching in Real-Time Spatial Data: Experiments and Analysis , 2016, Proc. VLDB Endow..

[25]  Marc'Aurelio Ranzato,et al.  Large Scale Distributed Deep Networks , 2012, NIPS.

[26]  Xing Xie,et al.  Urban computing with taxicabs , 2011, UbiComp '11.

[27]  Yu Zheng,et al.  Traffic prediction in a bike-sharing system , 2015, SIGSPATIAL/GIS.