Parallel Skyline Queries

In this paper, we design and analyze parallel algorithms for skyline queries. The skyline of a multidimensional set consists of the points for which no other point exists that is at least as good along every dimension. As a framework for parallel computation, we use both the MP model proposed in Koutris and Suciu (2011), which requires that the data is perfectly load-balanced, and a variation of the model in Afrati and Ullman (2010), the GMP model, which demands weaker load balancing constraints. In addition to load balancing, we want to minimize the number of blocking steps, where all processors must wait and synchronize. We propose a 2-step algorithm in the MP model for any dimension of the dataset, as well a 1-step algorithm for the case of 2 and 3 dimensions. Finally, we present a 1-step algorithm in the GMP model for any number of dimensions and a 1-step algorithm in the MP model for uniform distributions of data points.

[1]  Sergei Vassilvitskii,et al.  A model of computation for MapReduce , 2010, SODA '10.

[2]  Jirí Matousek,et al.  Computing Dominances in E^n , 1991, Inf. Process. Lett..

[3]  Dan Suciu,et al.  Parallel evaluation of conjunctive queries , 2011, PODS.

[4]  Jonghyun Park,et al.  Parallel Skyline Computation on Multicore Architectures , 2009, ICDE.

[5]  Donald Kossmann,et al.  The Skyline operator , 2001, Proceedings 17th International Conference on Data Engineering.

[6]  Christos Doulkeridis,et al.  AGiDS: A Grid-Based Strategy for Distributed Skyline Query Processing , 2009, Globe.

[7]  Ivan Stojmenovic,et al.  An optimal parallel algorithm for solving the maximal elements problem in the plane , 1988, Parallel Comput..

[8]  Anthony K. H. Tung,et al.  Efficient Skyline Query Processing on Peer-to-Peer Networks , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[9]  Ben Y. Zhao,et al.  Parallelizing Skyline Queries for Scalable Distribution , 2006, EDBT.

[10]  Christos Doulkeridis,et al.  Angle-based space partitioning for efficient parallel skyline computation , 2008, SIGMOD Conference.

[11]  Norbert Zeh,et al.  Parallel Computation of Skyline Queries , 2007, 21st International Symposium on High Performance Computing Systems and Applications (HPCS'07).

[12]  Andrew Rau-Chaplin,et al.  Scalable parallel geometric algorithms for coarse grained multicomputers , 1993, SCG '93.

[13]  Zengjian Hu,et al.  On weighted balls-into-bins games , 2005, Theor. Comput. Sci..

[14]  Jan Chomicki,et al.  Skyline with presorting , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[15]  Christopher Olston,et al.  Building a HighLevel Dataflow System on top of MapReduce: The Pig Experience , 2009, Proc. VLDB Endow..

[16]  Jarek Gryz,et al.  Maximal Vector Computation in Large Data Sets , 2005, VLDB.

[17]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[18]  Bernhard Seeger,et al.  Progressive skyline computation in database systems , 2005, TODS.

[19]  Jing Yang,et al.  Efficient parallel skyline processing using hyperplane projections , 2011, SIGMOD '11.

[20]  Martin Raab,et al.  "Balls into Bins" - A Simple and Tight Analysis , 1998, RANDOM.

[21]  Jeffrey D. Ullman,et al.  Optimizing joins in a map-reduce environment , 2010, EDBT '10.

[22]  Ken C. K. Lee,et al.  Approaching the Skyline in Z Order , 2007, VLDB.

[23]  Joseph M. Hellerstein,et al.  The declarative imperative: experiences and conjectures in distributed logic , 2010, SGMD.