CrocodileDB: Efficient Database Execution through Intelligent Deferment

The end of Moore’s law will push database system designers to be more judicious with computation as the growth in data outpaces the availability of computational resources. Eagerness, or aggressively consuming resources to immediately and quickly complete the task at hand, is one source of wasted resources in modern data systems where the systems expend unnecessary resources waiting on queries, data, or both. Intelligently deferring a task to a later point in time can increase result reuse, reduce work that might later be invalidated, or avoid unnecessary work altogether. We propose a research prototype system, CrocodileDB, which is a resource-efficient database system that automatically optimizes deferment based on user-specification and workload prediction. CrocodileDB integrates new ways of specifying timing information, new query execution policies, new task schedulers, and new data loading schemes.

[1]  Samuel Madden,et al.  Continuously adaptive continuous queries over streams , 2002, SIGMOD '02.

[2]  Gustavo Alonso,et al.  BatchDB: Efficient Isolated Execution of Hybrid OLTP+OLAP Workloads for Interactive Applications , 2017, SIGMOD Conference.

[3]  Rada Chirkova,et al.  Materialized Views , 2012, Found. Trends Databases.

[4]  Jeffrey F. Naughton,et al.  m-tables: Representing Missing Data , 2017, ICDT.

[5]  Scott Shenker,et al.  Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling , 2010, EuroSys '10.

[6]  Tim Kraska,et al.  Stale View Cleaning: Getting Fresh Answers from Stale Materialized Views , 2015, Proc. VLDB Endow..

[7]  Surajit Chaudhuri,et al.  Automated Selection of Materialized Views and Indexes in SQL Databases , 2000, VLDB.

[8]  Christoph Koch,et al.  Agile Views in a Dynamic Data Management System , 2011 .

[9]  Andrew A. Chien,et al.  Moore's Law: The First Ending and a New Beginning , 2013, Computer.

[10]  Luc Bouganim,et al.  Dynamic query scheduling in data integration systems , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[11]  Viktor Leis,et al.  How Good Are Query Optimizers, Really? , 2015, Proc. VLDB Endow..

[12]  Komal Shringare,et al.  Apache Hadoop Goes Realtime at Facebook , 2015 .

[13]  Joseph M. Hellerstein,et al.  Serverless Computing: One Step Forward, Two Steps Back , 2018, CIDR.

[14]  Latha S. Colby,et al.  Algorithms for deferred view maintenance , 1996, SIGMOD '96.

[15]  Gustavo Alonso,et al.  SharedDB: Killing One Thousand Queries With One Stone , 2012, Proc. VLDB Endow..

[16]  Anurag Gupta,et al.  Amazon Aurora: Design Considerations for High Throughput Cloud-Native Relational Databases , 2017, SIGMOD Conference.

[17]  George Candea,et al.  A Scalable, Predictable Join Operator for Highly Concurrent Data Warehouses , 2009, Proc. VLDB Endow..

[18]  Christos Doulkeridis,et al.  A survey of large-scale analytical query processing in MapReduce , 2013, The VLDB Journal.

[19]  Michael J. Franklin,et al.  On-the-fly sharing for streamed aggregation , 2006, SIGMOD Conference.

[20]  Michael J. Franklin,et al.  PSoup: a system for streaming queries over streaming data , 2003, The VLDB Journal.

[21]  Sanjay Krishnan,et al.  Opportunistic View Materialization with Deep Reinforcement Learning , 2019, ArXiv.

[22]  Hiren Patel,et al.  Selecting Subexpressions to Materialize at Datacenter Scale , 2018, Proc. VLDB Endow..

[23]  Hiren Patel,et al.  Computation Reuse in Analytics Job Service at Microsoft , 2018, SIGMOD Conference.

[24]  Elke A. Rundensteiner,et al.  State-slice: new paradigm of multi-query optimization of window-based stream queries , 2006, VLDB.

[25]  Marcin Zukowski,et al.  From Cooperative Scans to Predictive Buffer Management , 2012, Proc. VLDB Endow..

[26]  Jennifer Widom,et al.  Continuous queries over data streams , 2001, SGMD.

[27]  Bruce M. Maggs,et al.  Scalable query result caching for web applications , 2008, Proc. VLDB Endow..

[28]  Frederick Reiss,et al.  Constant-Time Query Processing , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[29]  Hicham G. Elmongui,et al.  Lazy Maintenance of Materialized Views , 2007, VLDB.

[30]  Hector Garcia-Molina,et al.  Applying update streams in a soft real-time database system , 1995, SIGMOD '95.

[31]  Eddie Kohler,et al.  Noria: dynamic, partially-stateful data-flow for high-performance web applications , 2018, OSDI.

[32]  Hao He,et al.  Asymmetric batch incremental view maintenance , 2005, 21st International Conference on Data Engineering (ICDE'05).

[33]  Indranil Gupta,et al.  Stateful Scalable Stream Processing at LinkedIn , 2017, Proc. VLDB Endow..

[34]  Alvin Cheung,et al.  Sloth: being lazy is a virtue (when issuing database queries) , 2014, SIGMOD Conference.

[35]  Gustavo Alonso,et al.  MQJoin: Efficient Shared Execution of Main-Memory Joins , 2016, Proc. VLDB Endow..

[36]  Tomasz Imielinski,et al.  Incomplete Information in Relational Databases , 1984, JACM.

[37]  Kenneth Knowles,et al.  One SQL to Rule Them All - an Efficient and Syntactically Idiomatic Approach to Management of Streams and Tables , 2019, SIGMOD Conference.

[38]  Jeyhun Karimov,et al.  Benchmarking Distributed Stream Data Processing Systems , 2019, 2018 IEEE 34th International Conference on Data Engineering (ICDE).

[39]  Abraham Silberschatz,et al.  Invisible loading: access-driven data transfer from raw files into database systems , 2013, EDBT '13.

[40]  Sanjay Krishnan,et al.  Intermittent Query Processing , 2019, Proc. VLDB Endow..

[41]  Stratis Viglas,et al.  Recycling in pipelined query evaluation , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[42]  Andrew A. Chien,et al.  UDP: A Programmable Accelerator for Extract-Transform-Load Workloads and More , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[43]  Frederick Reiss,et al.  Main-memory scan sharing for multi-core CPUs , 2008, Proc. VLDB Endow..

[44]  Hannes Mühleisen,et al.  Don't Hold My Data Hostage - A Case For Client Protocol Redesign , 2017, Proc. VLDB Endow..

[45]  Luis Leopoldo Perez,et al.  History-aware query optimization with materialized intermediate views , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[46]  Ion Stoica,et al.  iOLAP: Managing Uncertainty for Efficient Incremental OLAP , 2016, SIGMOD Conference.

[47]  Seif Haridi,et al.  Apache Flink™: Stream and Batch Processing in a Single Engine , 2015, IEEE Data Eng. Bull..

[48]  Anastasia Ailamaki,et al.  NoDB: efficient query execution on raw data files , 2012, Commun. ACM.

[49]  Anastasia Ailamaki,et al.  QPipe: a simultaneously pipelined relational query engine , 2005, SIGMOD '05.