Skyline and reverse skyline query processing in SpatialHadoop

Abstract In this paper, we study the problem of skyline and reverse skyline computation using SpatialHadoop, an extension of Hadoop that enhances its capabilities with spatial awareness. The exploitation of spatial indexing structures and the spatial properties of data can exploit MapReduce-based methods by reducing the reading, writing, computational and communicational overhead. Through our study, we propose two methods for skyline and reverse skyline computation, which operates in the spatial aware environment that SpatialHadoop provides. This environment allows for performing filtering on the initial dataset to retrieve an answer efficiently by using existing state-of-the-art indexing approaches. The proposed algorithms make use of the full capabilities of the indexing mechanisms provided by the SpatialHadoop and have been tested against large-scale datasets including a real-life, large-scale OpenStreetMap dataset. To the best of our knowledge, this is the first work that studies reverse skyline over SpatialHadoop.

[1]  Sean Chester,et al.  Scalable parallelization of skyline computation for multi-core processors , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[2]  Peter J. Haas,et al.  Interactive data Analysis: The Control Project , 1999, Computer.

[3]  Yufei Tao,et al.  Minimal MapReduce algorithms , 2013, SIGMOD '13.

[4]  Bernhard Seeger,et al.  Progressive skyline computation in database systems , 2005, TODS.

[5]  George Kollios,et al.  MRShare , 2010, Proc. VLDB Endow..

[6]  Farnoush Banaei Kashani,et al.  Voronoi-Based Geospatial Query Processing with MapReduce , 2010, 2010 IEEE Second International Conference on Cloud Computing Technology and Science.

[7]  Jan Chomicki,et al.  Skyline with Presorting: Theory and Optimizations , 2005, Intelligent Information Systems.

[8]  Xiang Lian,et al.  Monochromatic and bichromatic reverse skyline search over uncertain databases , 2008, SIGMOD Conference.

[9]  Xiao Qin,et al.  Efficient Parallel Skyline Evaluation Using MapReduce , 2016, IEEE Transactions on Parallel and Distributed Systems.

[10]  Chengfei Liu,et al.  On answering why-not questions in reverse skyline queries , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[11]  Yon Dohn Chung,et al.  Parallel data processing with MapReduce: a survey , 2012, SGMD.

[12]  Ira Assent,et al.  Efficient GPU-based skyline computation , 2013, DaMoN '13.

[13]  Yunjun Gao,et al.  On processing reverse k-skyband and ranked reverse skyline queries , 2015, Inf. Sci..

[14]  Sungwon Jung,et al.  MapReduce-based skyline query processing scheme using adaptive two-level grids , 2017, Cluster Computing.

[15]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[16]  Xuan Song,et al.  Accelerating Spatial Data Processing with MapReduce , 2010, 2010 IEEE 16th International Conference on Parallel and Distributed Systems.

[17]  Ling Zhu,et al.  Efficient Computation of Reverse Skyline on Data Stream , 2009, 2009 International Joint Conference on Computational Sciences and Optimization.

[18]  L. Dagum,et al.  OpenMP: an industry standard API for shared-memory programming , 1998 .

[19]  Gang Chen,et al.  On efficient reverse skyline query processing , 2014, Expert Syst. Appl..

[20]  Ahmed Eldawy,et al.  A Demonstration of SpatialHadoop: An Efficient MapReduce Framework for Spatial Data , 2013, Proc. VLDB Endow..

[21]  Christos Doulkeridis,et al.  A survey of large-scale analytical query processing in MapReduce , 2013, The VLDB Journal.

[22]  Ahmed Eldawy,et al.  The Era of Big Spatial Data: A Survey , 2015, Found. Trends Databases.

[23]  Naphtali Rishe,et al.  Experiences on Processing Spatial Data with MapReduce , 2009, SSDBM.

[24]  Ilaria Bartolini,et al.  SaLSa: computing the skyline without scanning the whole sky , 2006, CIKM '06.

[25]  Ahmed Eldawy,et al.  HadoopViz: A MapReduce framework for extensible visualization of big spatial data , 2016, 2016 IEEE 32nd International Conference on Data Engineering (ICDE).

[26]  Ahmed Eldawy,et al.  Spatial Partitioning Techniques in Spatial Hadoop , 2015, Proc. VLDB Endow..

[27]  Parke Godfrey,et al.  Skyline Cardinality for Relational Processing , 2004, FoIKS.

[28]  Christos Doulkeridis,et al.  Efficient skyline query processing in SpatialHadoop , 2015, Inf. Syst..

[29]  Yuan Tian,et al.  Z-SKY: an efficient skyline query processing framework based on Z-order , 2010, The VLDB Journal.

[30]  Yannis Manolopoulos,et al.  Processing skyline queries in temporal databases , 2017, SAC.

[31]  Edward Y. Chang,et al.  Parallel Spectral Clustering in Distributed Systems , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Kyuseok Shim,et al.  Parallel Computation of Skyline and Reverse Skyline Queries Using MapReduce , 2013, Proc. VLDB Endow..

[33]  John E. Stone,et al.  OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems , 2010, Computing in Science & Engineering.

[34]  Donald Kossmann,et al.  The Skyline operator , 2001, Proceedings 17th International Conference on Data Engineering.

[35]  Theodoros Tzouramanis,et al.  A survey of official online sources of high-quality free-of-charge geospatial data for maritime geographic information systems applications , 2017, Inf. Syst..

[36]  Theodoros Tzouramanis,et al.  A Survey of Skyline Query Processing , 2017, ArXiv.

[37]  Jing Yang,et al.  Efficient parallel skyline processing using hyperplane projections , 2011, SIGMOD '11.

[38]  Donald Kossmann,et al.  Shooting Stars in the Sky: An Online Algorithm for Skyline Queries , 2002, VLDB.

[39]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[40]  Gagan Agrawal,et al.  Optimizing MapReduce for GPUs with effective shared memory usage , 2012, HPDC '12.

[41]  Jan Chomicki,et al.  Skyline queries, front and back , 2013, SGMD.

[42]  Kevin Skadron,et al.  Scalable parallel programming , 2008, 2008 IEEE Hot Chips 20 Symposium (HCS).

[43]  Patrick Weber,et al.  OpenStreetMap: User-Generated Street Maps , 2008, IEEE Pervasive Computing.

[44]  Liang Chen,et al.  MapReduce Skyline Query Processing with a New Angular Partitioning Approach , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.

[45]  Ira Assent,et al.  SkyAlign: a portable, work-efficient skyline algorithm for multicore and GPU architectures , 2016, The VLDB Journal.

[46]  Kai Wang,et al.  Spatial Queries Evaluation with MapReduce , 2009, 2009 Eighth International Conference on Grid and Cooperative Computing.

[47]  Jonghyun Park,et al.  Parallel Skyline Computation on Multicore Architectures , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[48]  Ahmed Eldawy,et al.  SpatialHadoop: A MapReduce framework for spatial data , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[49]  Jennifer E. Rowley,et al.  The wisdom hierarchy: representations of the DIKW hierarchy , 2007, J. Inf. Sci..

[50]  Christian Buchta,et al.  On the Average Number of Maxima in a Set of Vectors , 1989, Inf. Process. Lett..

[51]  Ahmed Eldawy,et al.  The ecosystem of SpatialHadoop , 2015, SIGSPACIAL.

[52]  Justin Talbot,et al.  Phoenix++: modular MapReduce for shared-memory systems , 2011, MapReduce '11.

[53]  Katja Hose,et al.  A survey of skyline processing in highly distributed environments , 2011, The VLDB Journal.

[54]  Seema Bawa,et al.  A Survey of Traditional and MapReduceBased Spatial Query Processing Approaches , 2017, SGMD.

[55]  Ahmed Eldawy,et al.  Pigeon: A spatial MapReduce language , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[56]  Christos Doulkeridis,et al.  Angle-based space partitioning for efficient parallel skyline computation , 2008, SIGMOD Conference.

[57]  Ling Liu,et al.  Multi-criteria decision making with skyline computation , 2012, 2012 IEEE 13th International Conference on Information Reuse & Integration (IRI).

[58]  Seung-won Hwang,et al.  Scalable skyline computation using a balanced pivot selection technique , 2014, Inf. Syst..

[59]  Dan Suciu,et al.  Parallel Skyline Queries , 2012, ICDT '12.

[60]  Shuigeng Zhou,et al.  Adapting Skyline Computation to the MapReduce Framework: Algorithms and Experiments , 2011, DASFAA Workshops.

[61]  Sol Ji Kang,et al.  Performance Comparison of OpenMP, MPI, and MapReduce in Practical Problems , 2015, Adv. Multim..

[62]  Ahmed Eldawy,et al.  CG_Hadoop: computational geometry in MapReduce , 2013, SIGSPATIAL/GIS.

[63]  Beng Chin Ooi,et al.  Efficient Progressive Skyline Computation , 2001, VLDB.