论文信息 - Verdict: A System for Stochastic Query Planning

Verdict: A System for Stochastic Query Planning

Online services, wireless devices, and scientific simulations are all creating massive volumes of data at unprecedented rates. This abundance of rich datasets has made data-driven discovery the predominant approach across biology, medicine, physics, economics and even social sciences. Ironically, existing data processing tools have now become the bottleneck in data-driven activities. When faced with large-enough datasets (say, a few terabytes), even the fastest database systems can take hours or days to answer the simplest queries (see [1]). This response time is simply unacceptable to many users and applications. The data-driven discovery is often an interactive and iterative process: data scientists form a hypothesis, consult the data, adjust their hypothesis accordingly, and repeat this process until a satisfactory answer is discovered. Thus, slow and costly interactions with data can severely inhibit the data scientists’ productivity, engagement, and even creativity. Driven by the growing market for interactive analytics, both commercial and open source data warehouses continuously strive to provide interactive response times through various optimizations, such as parallelism, indexing, materialization, better query plans, data compression, columnar formats, and even in-memory processing. These optimizations are in essence no different than the mainstream database research on query processing, where the goals for the past four decades have simply been to:

Barzan Mozafari

[1] Ion Stoica,et al. BlinkDB: queries with bounded errors and bounded response times on very large data , 2012, EuroSys '13.