Intermittent Query Processing

Many applications ingest data in an intermittent, yet largely predictable, pattern. Existing systems tend to ignore how data arrives when making decisions about how to update (or refresh) an ongoing query. To address this shortcoming we propose a new query processing paradigm, Intermittent Query Processing (IQP), that bridges query execution and policies, to determine when to update results and how much resources to allocate for ensuring fast query updates. Here, for a query the system provides an initial result that is to be refreshed when policy dictates, such as after a defined number of new records arrive or a time interval elapses. In between intermittent data arrivals, IQP inactivates query execution by selectively releasing some resources occupied in normal execution that will be least helpful (for future refreshes) according to the arrival patterns for new records. We present an IQP prototype based on PostgreSQL that selectively persists the state associated with query operators to allow for fast query updates while constraining resource consumption. Our experiments show that for several application scenarios IQP greatly lowers query processing latency compared to batch systems, and largely reduces memory consumption with comparable latency compared to a state-of-theart incremental view maintenance technique. PVLDB Reference Format: Dixin Tang, Zechao Shang, Aaron J. Elmore, Sanjay Krishnan, Michael J. Franklin. Intermittent Query Processing. PVLDB, 12(11): 1427-1441, 2019. DOI: https://doi.org/10.14778/3342263.3342278

[1]  Craig Chambers,et al.  The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing , 2015, Proc. VLDB Endow..

[2]  Latha S. Colby,et al.  Algorithms for deferred view maintenance , 1996, SIGMOD '96.

[3]  Inderpal Singh Mumick,et al.  Selection of Views to Materialize Under a Maintenance Cost Constraint , 1999, ICDT.

[4]  Milos Nikolic,et al.  How to Win a Hot Dog Eating Contest: Distributed Incremental View Maintenance with Batch Updates , 2016, SIGMOD Conference.

[5]  Joseph M. Hellerstein,et al.  Using state modules for adaptive query processing , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[6]  Luping Ding,et al.  Dynamic Materialized Views , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[7]  Ahmed K. Elmagarmid,et al.  Don't be SCAREd: use SCalable Automatic REpairing with maximal likelihood and bounded changes , 2013, SIGMOD '13.

[8]  M. Abadi,et al.  Naiad: a timely dataflow system , 2013, SOSP.

[9]  Inderpal Singh Mumick,et al.  Selection of views to materialize in a data warehouse , 1997, IEEE Transactions on Knowledge and Data Engineering.

[10]  Christopher Ré,et al.  Materialization optimizations for feature selection workloads , 2014, SIGMOD Conference.

[11]  Ion Stoica,et al.  iOLAP: Managing Uncertainty for Efficient Incremental OLAP , 2016, SIGMOD Conference.

[12]  Nick Roussopoulos,et al.  DynaMat: a dynamic view management system for data warehouses , 1999, SIGMOD '99.

[13]  Badrish Chandramouli,et al.  Query suspend and resume , 2007, SIGMOD '07.

[14]  Jeffrey Davis,et al.  Continuous analytics over discontinuous streams , 2010, SIGMOD Conference.

[15]  Divesh Srivastava,et al.  Record linkage: similarity measures and algorithms , 2006, SIGMOD Conference.

[16]  Eddie Kohler,et al.  Noria: dynamic, partially-stateful data-flow for high-performance web applications , 2018, OSDI.

[17]  David J. DeWitt,et al.  NiagaraCQ: a scalable continuous query system for Internet databases , 2000, SIGMOD '00.

[18]  Paolo Papotti,et al.  Holistic data cleaning: Putting violations into context , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[19]  Paolo Papotti,et al.  Discovering Denial Constraints , 2013, Proc. VLDB Endow..

[20]  Michael Stonebraker,et al.  Load Shedding in a Data Stream Manager , 2003, VLDB.

[21]  Tim Kraska,et al.  Generalized scale independence through incremental precomputation , 2013, SIGMOD '13.

[22]  Frank Wm. Tompa,et al.  Efficiently updating materialized views , 1986, SIGMOD '86.

[23]  Nick Roussopoulos,et al.  An incremental access method for ViewCache: concept, algorithms, and cost analysis , 1991, TODS.

[24]  Milos Nikolic,et al.  LINVIEW: incremental view maintenance for complex analytical queries , 2014, SIGMOD Conference.

[25]  Joseph M. Hellerstein,et al.  Partial results for online query processing , 2002, SIGMOD '02.

[26]  Sunil Prabhakar,et al.  ERACER: a database approach for statistical inference and data cleaning , 2010, SIGMOD Conference.

[27]  Rada Chirkova,et al.  Materialized Views , 2012, Found. Trends Databases.

[28]  Dan Olteanu,et al.  Incremental View Maintenance with Triple Lock Factorization Benefits , 2017, SIGMOD Conference.

[29]  Allen B. Downey,et al.  Evidence for long-tailed distributions in the internet , 2001, IMW '01.

[30]  Anastasia Ailamaki,et al.  ReCache: Reactive Caching for Fast Analytics over Heterogeneous Data , 2017, Proc. VLDB Endow..

[31]  Wenfei Fan,et al.  Conditional Functional Dependencies for Data Cleaning , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[32]  Kenneth A. Ross,et al.  Materialized view maintenance and integrity constraint checking: trading space for time , 1996, SIGMOD '96.

[33]  Frederick Reiss,et al.  TelegraphCQ: Continuous Dataflow Processing for an Uncertain World , 2003, CIDR.

[34]  Hanuma Kodavalla,et al.  Resumable Online Index Rebuild in SQL Server , 2017, Proc. VLDB Endow..

[35]  Kyuseok Shim,et al.  Optimizing queries with materialized views , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[36]  Viktor Leis,et al.  How Good Are Query Optimizers, Really? , 2015, Proc. VLDB Endow..

[37]  V. S. Subrahmanian,et al.  Maintaining views incrementally , 1993, SIGMOD Conference.

[38]  Ashish Gupta,et al.  Materialized views: techniques, implementations, and applications , 1999 .

[39]  Kenneth A. Ross,et al.  Supporting multiple view maintenance policies , 1997, SIGMOD '97.

[40]  Patricia G. Selinger,et al.  Access path selection in a relational database management system , 1979, SIGMOD '79.

[41]  Lin Ma,et al.  Query-based Workload Forecasting for Self-Driving Database Management Systems , 2018, SIGMOD Conference.

[42]  Stijn Vansummeren,et al.  The Dynamic Yannakakis Algorithm: Compact and Efficient Query Processing Under Updates , 2017, SIGMOD Conference.

[43]  Yang Zhang,et al.  ICEDB: Intermittently-Connected Continuous Query Processing , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[44]  Amir Shaikhha,et al.  DBToaster: higher-order delta processing for dynamic, frequently fresh views , 2012, The VLDB Journal.

[45]  Jennifer Widom,et al.  Continuous queries over data streams , 2001, SGMD.

[46]  Krithi Ramamritham,et al.  Materialized view selection and maintenance using multi-query optimization , 2000, SIGMOD '01.

[47]  Peter M. G. Apers,et al.  Pipelining in query execution , 1990, Proceedings. PARBASE-90: International Conference on Databases, Parallel Architectures, and Their Applications.

[48]  Rajeev Motwani,et al.  Load shedding for aggregation queries over data streams , 2004, Proceedings. 20th International Conference on Data Engineering.

[49]  Stanley B. Zdonik,et al.  Staying FIT: Efficient Load Shedding Techniques for Distributed Stream Processing , 2007, VLDB.

[50]  Hiren Patel,et al.  Selecting Subexpressions to Materialize at Datacenter Scale , 2018, Proc. VLDB Endow..

[51]  Randy H. Katz,et al.  Improving MapReduce Performance in Heterogeneous Environments , 2008, OSDI.

[52]  Harumi A. Kuno,et al.  ‘Pause and resume’ functionality for index operations , 2011, 2011 IEEE 27th International Conference on Data Engineering Workshops.

[53]  Yannis Papakonstantinou,et al.  Utilizing IDs to Accelerate Incremental View Maintenance , 2015, SIGMOD Conference.

[54]  Carsten Binnig,et al.  Revisiting Reuse for Approximate Query Processing , 2017, Proc. VLDB Endow..

[55]  Hicham G. Elmongui,et al.  Lazy Maintenance of Materialized Views , 2007, VLDB.

[56]  Pramod Bhatotia,et al.  Incoop: MapReduce for incremental computations , 2011, SoCC.

[57]  Michael Stonebraker,et al.  P-Store: An Elastic Database System with Predictive Provisioning , 2018, SIGMOD Conference.

[58]  Lukasz Golab,et al.  On the relative trust between inconsistent data and inaccurate constraints , 2012, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[59]  Carsten Binnig,et al.  Revisiting Reuse in Main Memory Database Systems , 2016, SIGMOD Conference.

[60]  Surajit Chaudhuri,et al.  Automated Selection of Materialized Views and Indexes in SQL Databases , 2000, VLDB.

[61]  Martin L. Kersten,et al.  An architecture for recycling intermediates in a column-store , 2009, SIGMOD Conference.

[62]  Christopher Ré,et al.  The HoloClean Framework Dataset to be cleaned Denial Constraints External Information t 1 t 4 t 2 t 3 Johnnyo ’ s , 2017 .

[63]  Stratis Viglas,et al.  Recycling in pipelined query evaluation , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).