Smarter outlier detection and deeper understanding of large-scale taxi trip records: a case study of NYC

Outlier detection in large-scale taxi trip records has imposed significant technical challenges due to huge data volumes and complex semantics. In this paper, we report our preliminary work on detecting outliers from 166 millions taxi trips in the New York City (NYC) in 2009 through efficient spatial analysis and network analysis using a NAVTEQ street network with half a million edges. As a byproduct of large-scale shortest path computation in outlier detection, betweenness centralities of street network edges are computed and mapped. The techniques can be used to help better understand the connection strengths among different parts of NYC using the large-scale taxi trip records.

[1]  Ulrik Brandes,et al.  On variants of shortest-path betweenness centrality and their generic computation , 2008, Soc. Networks.

[2]  S. Phithakkitnukoon,et al.  Urban mobility study using taxi traces , 2011, TDMA '11.

[3]  Bin Jiang,et al.  Characterizing the human mobility pattern in a large street network. , 2009, Physical review. E, Statistical, nonlinear, and soft matter physics.

[4]  Hui Xiong,et al.  A Taxi Driving Fraud Detection System , 2011, 2011 IEEE 11th International Conference on Data Mining.

[5]  Pietro Liò,et al.  Intra-City Urban Network and Traffic Flow Analysis from GPS Mobility Trace , 2011, ArXiv.

[6]  Yanmin Zhu,et al.  Challenges and Opportunities in Exploiting Large-Scale GPS Probe Data , 2011 .

[7]  Andrew V. Goldberg,et al.  PHAST: Hardware-accelerated shortest path trees , 2013, J. Parallel Distributed Comput..

[8]  Xing Xie,et al.  T-drive: driving directions based on taxi trajectories , 2010, GIS '10.

[9]  Chengyang Zhang,et al.  Map-matching for low-sampling-rate GPS trajectories , 2009, GIS.

[10]  S. Winter,et al.  Can Betweenness Centrality Explain Traffic Flow , 2009 .

[11]  Le Gruenwald,et al.  U2SOD-DB: a database system to manage large-scale ubiquitous urban sensing origin-destination data , 2012, UrbComp '12.

[12]  Qingquan Li,et al.  Path-finding through flexible hierarchical road networks: An experiential approach using taxi trajectory data , 2011, Int. J. Appl. Earth Obs. Geoinformation.

[13]  Daqing Zhang,et al.  Measuring social functions of city regions from large-scale taxi behaviors , 2011, 2011 IEEE International Conference on Pervasive Computing and Communications Workshops (PERCOM Workshops).

[14]  Zhi-Hua Zhou,et al.  iBAT: detecting anomalous taxi trajectories from GPS traces , 2011, UbiComp '11.

[15]  Xing Xie,et al.  Discovering spatio-temporal causal interactions in traffic data streams , 2011, KDD.

[16]  Xing Xie,et al.  Reducing Uncertainty of Low-Sampling-Rate Trajectories , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[17]  Andrew V. Goldberg,et al.  PHAST: Hardware-Accelerated Shortest Path Trees , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.

[18]  Peter Sanders,et al.  Contraction Hierarchies: Faster and Simpler Hierarchical Routing in Road Networks , 2008, WEA.