GPU Accelerated Top-K Selection With Efficient Early Stopping

Top-k selection retrieves the k highest ranking tuples from a given relation by utilizing a user-defined monotone function. Efficient query processing entails skipping evaluation of low ranking tuples by leveraging on early termination. Achieving this goal is possible using sophisticated data organization schemes combined with random access to resolve score ambiguity. Although these practices have proven to be successful for CPU based systems operating on disk-resident data, they have yet to be tested on modern in-memory systems utilizing GPU based processing. This problem is hard to tackle because random accesses are necessary for enabling high algorithmic efficiency (i.e. low number of object evaluations), while at the same time being inherently detrimental for GPU based processing. Existing solutions that rely on data re-ordering support sequential access at the expense of higher object evaluations. In our work, we investigate the effects of data preordering when combined with intelligent partitioning to enable efficient early termination on GPUs. We concentrate on evaluating the proposed solutions when data reside either in device or host memory. Our experimental results demonstrate the high potential of our methods for a variety of query parameters and data distributions. We showcase between 2× to 200× better query latency (executing on device or host memory respectively) when compared against state-of-the-art solutions that necessitate evaluation of all tuples in a given relation.

[1]  John R. Smith,et al.  The onion technique: indexing for linear optimization queries , 2000, SIGMOD '00.

[2]  Lei Zou,et al.  Pareto-Based Dominant Graph: An Efficient Indexing Structure to Answer Top-K Queries , 2008, IEEE Transactions on Knowledge and Data Engineering.

[3]  Junghoo Cho,et al.  The Hybrid-Layer Index: A synergic approach to answering top-k queries in arbitrary subspaces , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[4]  Ihab F. Ilyas,et al.  A survey of top-k query processing techniques in relational database systems , 2008, CSUR.

[5]  Vyas Sekar,et al.  An empirical evaluation of entropy-based traffic anomaly detection , 2008, IMC '08.

[6]  Kenneth A. Ross,et al.  Optimizing select conditions on GPUs , 2013, DaMoN '13.

[7]  Donald Kossmann,et al.  The Skyline operator , 2001, Proceedings 17th International Conference on Data Engineering.

[8]  Christos Doulkeridis,et al.  Angle-based space partitioning for efficient parallel skyline computation , 2008, SIGMOD Conference.

[9]  Jian Pei,et al.  Efficient Skyline and Top-k Retrieval in Subspaces , 2007, IEEE Transactions on Knowledge and Data Engineering.

[10]  Ira Assent,et al.  Work-Efficient Parallel Skyline Computation for the GPU , 2015, Proc. VLDB Endow..

[11]  Jiawei Han,et al.  Towards robust indexing for ranked queries , 2006, VLDB.

[12]  Gustavo Alonso,et al.  Main-memory hash joins on multi-core CPUs: Tuning to the underlying hardware , 2012, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[13]  Cláudio T. Silva,et al.  TopKube: A Rank-Aware Data Cube for Real-Time Exploration of Spatiotemporal Data , 2017, IEEE Transactions on Visualization and Computer Graphics.

[14]  Gerhard Weikum,et al.  IO-Top-k: index-access optimized top-k query processing , 2006, VLDB.

[15]  Jianzhong Li,et al.  TKAP: Efficiently processing top-k query on massive data by adaptive pruning , 2015, Knowledge and Information Systems.

[16]  Moni Naor,et al.  Optimal aggregation algorithms for middleware , 2001, PODS.

[17]  Vagelis Hristidis,et al.  PREFER: a system for the efficient execution of multi-parametric ranked queries , 2001, SIGMOD '01.

[18]  Samuel Madden,et al.  Efficient Top-K Query Processing on Massively Parallel Hardware , 2018, SIGMOD Conference.

[19]  Marcus Fontoura,et al.  Evaluation strategies for top-k queries over memory-resident inverted indexes , 2011, Proc. VLDB Endow..

[20]  Man Lung Yiu,et al.  Efficient top-k aggregation of ranked inputs , 2007, TODS.

[21]  Seung-won Hwang,et al.  Toward Scalable Indexing for Top-k Queries , 2014, IEEE Trans. Knowl. Data Eng..

[22]  Carlos Eduardo Scheidegger,et al.  Nanocubes for Real-Time Exploration of Spatiotemporal Datasets , 2013, IEEE Transactions on Visualization and Computer Graphics.

[23]  Yufei Tao,et al.  Branch-and-bound processing of ranked queries , 2007, Inf. Syst..

[24]  Vassilis J. Tsotras,et al.  FPGA-accelerated group-by aggregation using synchronizing caches , 2016, DaMoN '16.

[25]  Xixian Han,et al.  Efficient Top-k Retrieval on Massive Data , 2015, IEEE Trans. Knowl. Data Eng..

[26]  Lukasz Golab,et al.  Smart Meter Data Analytics , 2017, ACM Trans. Database Syst..

[27]  Peter Benjamin Volk,et al.  GPU join processing revisited , 2012, DaMoN '12.

[28]  Dimitrios Gunopulos,et al.  Answering top-k queries using views , 2006, VLDB.