论文信息 - Adaptive Data Skipping in Main-Memory Systems

Adaptive Data Skipping in Main-Memory Systems

As modern main-memory optimized data systems increasingly rely on fast scans, lightweight indexes that allow for data skipping play a crucial role in data filtering to reduce system I/O. Scans benefit from data skipping when the data order is sorted, semi-sorted, or comprised of clustered values. However data skipping loses effectiveness over arbitrary data distributions. Applying data skipping techniques over non-sorted data can significantly decrease query performance since the extra cost of metadata reads result in no corresponding scan performance gains. We introduce adaptive data skipping as a framework for structures and techniques that respond to a vast array of data distributions and query workloads. We reveal an adaptive zonemaps design and implementation on a main-memory column store prototype to demonstrate that adaptive data skipping has potential for 1.4X speedup.

Stratos Idreos | Wilson Qin

[1] Ippokratis Pandis,et al. Impala: Eine moderne, quellen-offene SQL Engine für Hadoop , 2016 .

[2] Piotr Synak,et al. Brighthouse: an analytic data warehouse for ad-hoc queries , 2008, Proc. VLDB Endow..

[3] Sam Lightstone,et al. DB2 with BLU Acceleration: So Much More than Just a Column Store , 2013, Proc. VLDB Endow..

[4] References , 1971 .

[5] Peter J. Haas,et al. Eagle-eyed elephant: split-oriented indexing in Hadoop , 2013, EDBT '13.

[6] Martin Grund,et al. Impala: A Modern, Open-Source SQL Engine for Hadoop , 2015, CIDR.

[7] Scott Shenker,et al. Shark: SQL and rich analytics at scale , 2012, SIGMOD '13.