As modern main-memory optimized data systems increasingly rely on fast scans, lightweight indexes that allow for data skipping play a crucial role in data filtering to reduce system I/O. Scans benefit from data skipping when the data order is sorted, semi-sorted, or comprised of clustered values. However data skipping loses effectiveness over arbitrary data distributions. Applying data skipping techniques over non-sorted data can significantly decrease query performance since the extra cost of metadata reads result in no corresponding scan performance gains. We introduce adaptive data skipping as a framework for structures and techniques that respond to a vast array of data distributions and query workloads. We reveal an adaptive zonemaps design and implementation on a main-memory column store prototype to demonstrate that adaptive data skipping has potential for 1.4X speedup.
[1]
Ippokratis Pandis,et al.
Impala: Eine moderne, quellen-offene SQL Engine für Hadoop
,
2016
.
[2]
Piotr Synak,et al.
Brighthouse: an analytic data warehouse for ad-hoc queries
,
2008,
Proc. VLDB Endow..
[3]
Sam Lightstone,et al.
DB2 with BLU Acceleration: So Much More than Just a Column Store
,
2013,
Proc. VLDB Endow..
[4]
References
,
1971
.
[5]
Peter J. Haas,et al.
Eagle-eyed elephant: split-oriented indexing in Hadoop
,
2013,
EDBT '13.
[6]
Martin Grund,et al.
Impala: A Modern, Open-Source SQL Engine for Hadoop
,
2015,
CIDR.
[7]
Scott Shenker,et al.
Shark: SQL and rich analytics at scale
,
2012,
SIGMOD '13.