TAREEG: a MapReduce-based system for extracting spatial data from OpenStreetMap

Real spatial data, e.g., detailed road networks, rivers, buildings, parks, are not easily available for most of the world. This hinders the practicality of many research ideas that need a real spatial data for testing and experiments. Such data is often available for governmental use, or at major software companies, but it is prohibitively expensive to build or buy for academia or individual researchers. This paper presents TAREEG; a web-service that makes real spatial data, from anywhere in the world, available at the fingertips of every researcher or individual. TAREEG gets all its data by leveraging the richness of OpenStreetMap data set; the most comprehensive available spatial data of the world. Yet, it is still challenging to obtain OpenStreetMap data due to the size limitations, special data format, and the noisy nature of spatial data. TAREEG employs MapReduce-based techniques to make it efficient and easy to extract OpenStreetMap data in a standard form with minimal effort. Experimental results show that TAREEG is highly accurate and efficient.

[1]  Veerle Fack,et al.  An effective heuristic for computing many shortest path alternatives in road networks , 2012, Int. J. Geogr. Inf. Sci..

[2]  Jianjun Li,et al.  Continuous reverse k nearest neighbor monitoring on moving objects in road networks , 2010, Inf. Syst..

[3]  Xiaofang Zhou,et al.  Finding the most accessible locations: reverse path nearest neighbor query in road networks , 2011, GIS.

[4]  Vassilis J. Tsotras,et al.  Graph Indexing of Road Networks for Shortest Path Queries with Label Restrictions , 2010, Proc. VLDB Endow..

[5]  M. Haklay How Good is Volunteered Geographical Information? A Comparative Study of OpenStreetMap and Ordnance Survey Datasets , 2010 .

[6]  Dimitris Papadias,et al.  Aggregate nearest neighbor queries in road networks , 2005, IEEE Transactions on Knowledge and Data Engineering.

[7]  Richard L. Church,et al.  Finding shortest paths on real road networks: the case for A* , 2009, Int. J. Geogr. Inf. Sci..

[8]  Peter Mooney,et al.  Characteristics of Heavily Edited Objects in OpenStreetMap , 2012, Future Internet.

[9]  Pascal Neis,et al.  Analyzing the Contributor Activity of a Volunteered Geographic Information Project - The Case of OpenStreetMap , 2012, ISPRS Int. J. Geo Inf..

[10]  Torben Bach Pedersen,et al.  Nearest neighbor queries in road networks , 2003, GIS '03.

[11]  Heng Tao Shen,et al.  Multi-source Skyline Query Processing in Road Networks , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[12]  Christian S. Jensen,et al.  Effective caching of shortest paths for location-based services , 2012, SIGMOD Conference.

[13]  Cyrus Shahabi,et al.  Enforcing k nearest neighbor query integrity on road networks , 2012, SIGSPATIAL/GIS.

[14]  Cyrus Shahabi,et al.  Authentication of k Nearest Neighbor Query on Road Networks , 2014, IEEE Transactions on Knowledge and Data Engineering.

[15]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[16]  Hui Xiong,et al.  Multi-type nearest neighbor queries in road networks with time window constraints , 2009, GIS.

[17]  Ahmed Eldawy,et al.  TAREEG: a MapReduce-based web service for extracting spatial data from OpenStreetMap , 2014, SIGMOD Conference.

[18]  Cheng Long,et al.  Efficient algorithms for optimal location queries in road networks , 2014, SIGMOD Conference.

[19]  Weiwei Sun,et al.  Voronoi-based aggregate nearest neighbor query processing in road networks , 2010, GIS '10.

[20]  XiaoXiaokui,et al.  Shortest path and distance queries on road networks , 2012, VLDB 2012.

[21]  Ahmed Eldawy,et al.  A Demonstration of SpatialHadoop: An Efficient MapReduce Framework for Spatial Data , 2013, Proc. VLDB Endow..

[22]  Kyriakos Mouratidis,et al.  Continuous nearest neighbor monitoring in road networks , 2006, VLDB.

[23]  Shuigeng Zhou,et al.  Shortest path and distance queries on road networks: towards bridging theory and practice , 2013, SIGMOD '13.

[24]  Ahmed Eldawy,et al.  Pigeon: A spatial MapReduce language , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[25]  Christos Faloutsos,et al.  The R+-Tree: A Dynamic Index for Multi-Dimensional Objects , 1987, VLDB.

[26]  Yuan Tian,et al.  Finding skyline paths in road networks , 2009, GIS.

[27]  Shuigeng Zhou,et al.  DISKs: A System for Distributed Spatial Group Keyword Search on Road Networks , 2012, Proc. VLDB Endow..

[28]  Jürg Nievergelt,et al.  The Grid File: An Adaptable, Symmetric Multikey File Structure , 1984, TODS.

[29]  Arbee L. P. Chen,et al.  Continuous Evaluation of Fastest Path Queries on Road Networks , 2007, SSTD.

[30]  Heng Tao Shen,et al.  Monitoring path nearest neighbor in road networks , 2009, SIGMOD Conference.

[31]  Shuigeng Zhou,et al.  Shortest Path and Distance Queries on Road Networks: An Experimental Evaluation , 2012, Proc. VLDB Endow..