Massively parallel skyline computation for processing-in-memory architectures

Processing-In-Memory (PIM) is an increasingly popular architecture aimed at addressing the 'memory wall' crisis by prioritizing the integration of processors within DRAM. It promotes low data access latency, high bandwidth, massive parallelism, and low power consumption. The skyline operator is a known primitive used to identify those multi-dimensional points offering optimal trade-offs within a given dataset. For large multidimensional dataset, calculating the skyline is extensively compute and data intensive. Although, PIM systems present opportunities to mitigate this cost, their execution model relies on all processors operating in isolation with minimal data exchange. This prohibits direct application of known skyline optimizations which are inherently sequential, creating dependencies and large intermediate results that limit the maximum parallelism, throughput, and require an expensive merging phase. In this work, we address these challenges by introducing the first skyline algorithm for PIM architectures, called DSky. It is designed to be massively parallel and throughput efficient by leveraging a novel work assignment strategy that emphasizes load balancing. Our experiments demonstrate that it outperforms the state-of-the-art algorithms for CPUs and GPUs, in most cases. DSky achieves 2× to 14× higher throughput compared to the state-of-the-art solutions on competing CPU and GPU architectures. Furthermore, we showcase DSky's good scaling properties which are intertwined with PIM's ability to allocate resources with minimal added cost. In addition, we showcase an order of magnitude better energy consumption compared to CPUs and GPUs.

[1]  Mike Ignatowski,et al.  TOP-PIM: throughput-oriented programmable processing in memory , 2014, HPDC '14.

[2]  Aziz Nasridinov,et al.  A two-phase data space partitioning for efficient skyline computation , 2017, Cluster Computing.

[3]  Vittorio Zaccaria,et al.  ReSPIR: A Response Surface-Based Pareto Iterative Refinement for Application-Specific Design Space Exploration , 2009, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[4]  Ira Assent,et al.  Work-Efficient Parallel Skyline Computation for the GPU , 2015, Proc. VLDB Endow..

[5]  Thomas L. Sterling,et al.  Analysis and Modeling of Advanced PIM Architecture Design Tradeoffs , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[6]  Rainer Buchty,et al.  Data-Centric Computing Frontiers: A Survey On Processing-In-Memory , 2016, MEMSYS.

[7]  Ramyad Hadidi,et al.  GraphPIM: Enabling Instruction-Level PIM Offloading in Graph Computing Frameworks , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[8]  Seung-won Hwang,et al.  BSkyTree: scalable skyline computation using a balanced pivot selection , 2010, EDBT '10.

[9]  Wolf-Tilo Balke,et al.  Multi-objective Query Processing for Database Systems , 2004, VLDB.

[10]  Ken C. K. Lee,et al.  Approaching the Skyline in Z Order , 2007, VLDB.

[11]  Ilaria Bartolini,et al.  Efficient sort-based skyline evaluation , 2008, TODS.

[12]  Kiyoung Choi,et al.  A scalable processing-in-memory accelerator for parallel graph processing , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[13]  Babak Falsafi,et al.  The mondrian data engine , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[14]  Ira Assent,et al.  Content-Based Multimedia Retrieval in the Presence of Unknown User Preferences , 2011, MMM.

[15]  Donald Kossmann,et al.  The Skyline operator , 2001, Proceedings 17th International Conference on Data Engineering.

[16]  Sean Chester,et al.  Scalable parallelization of skyline computation for multi-core processors , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[17]  Dominique Lavenier,et al.  DNA mapping using Processor-in-Memory architecture , 2016, 2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[18]  Christos Doulkeridis,et al.  Angle-based space partitioning for efficient parallel skyline computation , 2008, SIGMOD Conference.

[19]  Jeffrey D. Blanchard,et al.  Fast k-selection algorithms for graphics processing units , 2012, JEAL.

[20]  Gustavo Alonso,et al.  Parallel Computation of Skyline Queries , 2013, 2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines.

[21]  Luca Fossati,et al.  Decision-Theoretic Design Space Exploration of Multiprocessor Platforms , 2010, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[22]  Lieven Eeckhout,et al.  Cole: compiler optimization level exploration , 2008, CGO '08.

[23]  Kyuseok Shim,et al.  Efficient Processing of Skyline Queries Using MapReduce , 2017, IEEE Transactions on Knowledge and Data Engineering.

[24]  Jing Yang,et al.  Efficient parallel skyline processing using hyperplane projections , 2011, SIGMOD '11.

[25]  Yiran Chen,et al.  PipeLayer: A Pipelined ReRAM-Based Accelerator for Deep Learning , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[26]  Daniel G. Brown,et al.  Knowledge-informed Pareto simulated annealing for multi-objective spatial allocation , 2007, Comput. Environ. Urban Syst..

[27]  Sean Chester,et al.  Template Skycube Algorithms for Heterogeneous Parallelism on Multicore and GPU Architectures , 2017, SIGMOD Conference.

[28]  Antonin Ponsich,et al.  A Survey on Multiobjective Evolutionary Algorithms for the Solution of the Portfolio Optimization Problem and Other Finance and Economics Applications , 2013, IEEE Transactions on Evolutionary Computation.

[29]  Jonghyun Park,et al.  Parallel Skyline Computation on Multicore Architectures , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[30]  Dimitris Sacharidis,et al.  Serving the Sky: Discovering and Selecting Semantic Web Services through Dynamic Skyline Queries , 2008, 2008 IEEE International Conference on Semantic Computing.

[31]  Jarek Gryz,et al.  Algorithms and analyses for maximal vector computation , 2007, The VLDB Journal.

[32]  Vittorio Zaccaria,et al.  SPIRIT: Spectral-Aware Pareto Iterative Refinement Optimization for Supervised High-Level Synthesis , 2015, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[33]  Maya Gokhale,et al.  Processing in Memory: The Terasys Massively Parallel PIM Array , 1995, Computer.

[34]  Sean Chester,et al.  On the Suitability of Skyline Queries for Data Exploration , 2014, EDBT/ICDT Workshops.

[35]  Hans-Peter Kriegel,et al.  Route skyline queries: A multi-preference path planning approach , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[36]  Chun Chen,et al.  The architecture of the DIVA processing-in-memory chip , 2002, ICS '02.

[37]  Luca Benini,et al.  A survey of design techniques for system-level dynamic power management , 2000, IEEE Trans. Very Large Scale Integr. Syst..

[38]  Thomas Fahringer,et al.  A multi-objective auto-tuning framework for parallel codes , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[39]  Shangguang Wang,et al.  Particle Swarm Optimization with Skyline Operator for Fast Cloud-based Web Service Composition , 2013, Mob. Networks Appl..

[40]  Jan Chomicki,et al.  Skyline with Presorting: Theory and Optimizations , 2005, Intelligent Information Systems.

[41]  Chantal Ykman-Couvreur,et al.  MULTICUBE: Multi-objective Design Space Exploration of Multi-core Architectures , 2010, 2010 IEEE Computer Society Annual Symposium on VLSI.