Big data computation of taxi movement in New York City

We seek to extract and explore statistics that characterize New York City traffic flows based on 700 million taxi trips in the 2010–2013 New York City taxi data. This paper presents a two-part solution for intensive computation: space and time design considerations for estimating taxi trajectories with Dijkstra's algorithm, and job parallelization and scheduling with HTCondor. Our contribution is to present a solution that reduces execution time from 3,000 days to less than a day with detailed analysis of the necessary design decisions.

[1]  José M. F. Moura,et al.  Big Data Analysis with Signal Processing on Graphs: Representation and processing of massive data sets with irregular structure , 2014, IEEE Signal Processing Magazine.

[2]  Peter Sanders,et al.  Engineering Route Planning Algorithms , 2009, Algorithmics of Large and Complex Networks.

[3]  Alan Edelman,et al.  Julia: A Fast Dynamic Language for Technical Computing , 2012, ArXiv.

[4]  Christian Sommer,et al.  Shortest-path queries in static networks , 2014, ACM Comput. Surv..

[5]  Douglas Thain,et al.  Distributed computing in practice: the Condor experience , 2005, Concurr. Pract. Exp..

[6]  Henry X. Liu,et al.  Modeling the day-to-day traffic evolution process after an unexpected network disruption , 2012 .

[7]  Edsger W. Dijkstra,et al.  A note on two problems in connexion with graphs , 1959, Numerische Mathematik.

[8]  Peter Sanders,et al.  Contraction Hierarchies: Faster and Simpler Hierarchical Routing in Road Networks , 2008, WEA.

[9]  José M. F. Moura,et al.  Taxi data in New York city: A network perspective , 2015, 2015 49th Asilomar Conference on Signals, Systems and Computers.

[10]  Daniel B. Work,et al.  Using coarse GPS data to quantify city-scale transportation system resilience to extreme events , 2015, ArXiv.

[11]  Andrew V. Goldberg,et al.  Computing the shortest path: A search meets graph theory , 2005, SODA '05.

[12]  Hadley Wickham,et al.  ggmap: Spatial Visualization with ggplot2 , 2013, R J..

[13]  Francisco C. Pereira,et al.  An off-line map-matching algorithm for incomplete map databases , 2009 .

[14]  Andrew V. Goldberg,et al.  Shortest paths algorithms: Theory and experimental evaluation , 1994, SODA '94.

[15]  Reynold Xin,et al.  GraphX: a resilient distributed graph system on Spark , 2013, GRADES.

[16]  Xing Xie,et al.  Urban computing with taxicabs , 2011, UbiComp '11.

[17]  Jimeng Sun,et al.  Beyond streams and graphs: dynamic tensor analysis , 2006, KDD '06.

[18]  Borko Furht,et al.  Handbook of Data Intensive Computing , 2011 .

[19]  Daqing Zhang,et al.  Urban Traffic Modelling and Prediction Using Large Scale Taxi GPS Traces , 2012, Pervasive.

[20]  Ye Zhao,et al.  Visualizing Hidden Themes of Taxi Movement with Semantic Transformation , 2014, 2014 IEEE Pacific Visualization Symposium.

[21]  Washington Y. Ochieng,et al.  A general map matching algorithm for transport telematics applications , 2003 .

[22]  Novica Nosovic,et al.  Dijkstra's shortest path algorithm serial and parallel execution performance analysis , 2012, 2012 Proceedings of the 35th International Convention MIPRO.

[23]  Lin Sun,et al.  Real-Time Detection of Anomalous Taxi Trajectories from GPS Traces , 2011, MobiQuitous.

[24]  Daniel B. Work,et al.  New York City Taxi Trip Data (2010-2013) , 2016 .

[25]  Xin-She Yang,et al.  Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[26]  Christos Faloutsos,et al.  MultiAspectForensics: Pattern Mining on Large-Scale Heterogeneous Networks with Tensor Analysis , 2011, 2011 International Conference on Advances in Social Networks Analysis and Mining.

[27]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[28]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[29]  Franz Franchetti,et al.  Program generation for the all-pairs shortest path problem , 2006, 2006 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[30]  Cláudio T. Silva,et al.  Visual Exploration of Big Spatio-Temporal Urban Data: A Study of New York City Taxi Trips , 2013, IEEE Transactions on Visualization and Computer Graphics.

[31]  Jianting Zhang Efficient Frequent Sequence Mining on Taxi Trip Records Using Road Network Shortcuts , 2014 .

[32]  Shlomo Bekhor,et al.  Augmented Betweenness Centrality for Environmentally Aware Traffic Monitoring in Transportation Networks , 2013, J. Intell. Transp. Syst..

[33]  Hassan A. Karimi,et al.  Computing least air pollution exposure routes , 2014, Int. J. Geogr. Inf. Sci..

[34]  Dieter Pfoser,et al.  On Map-Matching Vehicle Tracking Data , 2005, VLDB.