Incremental View Maintenance with Triple Lock Factorization Benefits

We introduce F-IVM, a unified incremental view maintenance (IVM) approach for a variety of tasks, including gradient computation for learning linear regression models over joins, matrix chain multiplication, and factorized evaluation of conjunctive queries. F-IVM is a higher-order IVM algorithm that reduces the maintenance of the given task to the maintenance of a hierarchy of increasingly simpler views. The views are functions mapping keys, which are tuples of input data values, to payloads, which are elements from a task-specific ring. Whereas the computation over the keys is the same for all tasks, the computation over the payloads depends on the task. F-IVM achieves efficiency by factorizing the computation of the keys, payloads, and updates. We implemented F-IVM as an extension of DBToaster. We show in a range of scenarios that it can outperform classical first-order IVM, DBToaster's fully recursive higher-order IVM, and plain recomputation by orders of magnitude while using less memory.

[1]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[2]  Hung Q. Ngo,et al.  In-Database Learning with Sparse Tensors , 2017, PODS.

[3]  Christoph Koch,et al.  World-set decompositions: Expressiveness and efficient algorithms , 2007, Theor. Comput. Sci..

[4]  Amir Shaikhha,et al.  DBToaster: higher-order delta processing for dynamic, frequently fresh views , 2012, The VLDB Journal.

[5]  Christoph Koch,et al.  Incremental query evaluation in a ring of databases , 2010, PODS.

[6]  Kun Li,et al.  The MADlib Analytics Library or MAD Skills, the SQL , 2012, Proc. VLDB Endow..

[7]  Kesheng Wu,et al.  Incremental View Maintenance over Array Data , 2017, SIGMOD Conference.

[8]  Yannis Papakonstantinou,et al.  Utilizing IDs to Accelerate Incremental View Maintenance , 2015, SIGMOD Conference.

[9]  Milos Nikolic,et al.  DBToaster: Higher-order Delta Processing for Dynamic, Frequently Fresh Views , 2012, Proc. VLDB Endow..

[10]  Steffen Rendle Scaling Factorization Machines to Relational Data , 2013, Proc. VLDB Endow..

[11]  Ryan Johnson,et al.  Processing Analytical Workloads Incrementally , 2015, ArXiv.

[12]  Xin-She Yang,et al.  Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[13]  Frederick Reiss,et al.  Compressed linear algebra for large-scale machine learning , 2016, The VLDB Journal.

[14]  Val Tannen,et al.  Provenance semirings , 2007, PODS.

[15]  Neoklis Polyzotis,et al.  Data Management Challenges in Production Machine Learning , 2017, SIGMOD Conference.

[16]  Badrish Chandramouli,et al.  Trill: A High-Performance Incremental Query Processor for Diverse Analytics , 2014, Proc. VLDB Endow..

[17]  Jeffrey F. Naughton,et al.  Towards Linear Algebra over Normalized Data , 2016, Proc. VLDB Endow..

[18]  Rada Chirkova,et al.  Materialized Views , 2012, Found. Trends Databases.

[19]  Ronald Fagin,et al.  A simplied universal relation assumption and its properties , 1982, TODS.

[20]  Emir Pasalic,et al.  Design and Implementation of the LogicBlox System , 2015, SIGMOD Conference.

[21]  Atri Rudra,et al.  FAQ: Questions Asked Frequently , 2015, PODS.

[22]  Paul G. Brown,et al.  Overview of sciDB: large scale array storage, processing and analysis , 2010, SIGMOD Conference.

[23]  Jakub Závodný,et al.  Aggregation and Ordering in Factorised Databases , 2013, Proc. VLDB Endow..

[24]  Dan Olteanu,et al.  Learning Linear Regression Models over Factorized Joins , 2016, SIGMOD Conference.

[25]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[26]  Dan Olteanu,et al.  F: Regression Models over Factorized Views , 2016, Proc. VLDB Endow..

[27]  Stijn Vansummeren,et al.  The Dynamic Yannakakis Algorithm: Compact and Efficient Query Processing Under Updates , 2017, SIGMOD Conference.

[28]  Florin Rusu,et al.  Speculative Approximations for Terascale Distributed Gradient Descent Optimization , 2015, DanaC@SIGMOD.

[29]  Jun Yang,et al.  Data Management in Machine Learning: Challenges, Techniques, and Systems , 2017, SIGMOD Conference.

[30]  Nikos D. Sidiropoulos,et al.  Tensor Decomposition for Signal Processing and Machine Learning , 2016, IEEE Transactions on Signal Processing.

[31]  Ying Xing,et al.  The Design of the Borealis Stream Processing Engine , 2005, CIDR.

[32]  Carl Kesselman,et al.  Concepts and Architecture , 2004, The Grid 2, 2nd Edition.

[33]  Jakub Závodný,et al.  Size Bounds for Factorised Representations of Query Results , 2015, TODS.

[34]  Jeffrey F. Naughton,et al.  Learning Generalized Linear Models Over Normalized Data , 2015, SIGMOD Conference.

[35]  Wei Hong,et al.  TinyDB: an acquisitional query processing system for sensor networks , 2005, TODS.

[36]  Nicole Schweikardt,et al.  Answering Conjunctive Queries under Updates , 2017, PODS.

[37]  Shai Shalev-Shwartz,et al.  Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..

[38]  Todd J. Green,et al.  Live Programming in the LogicBlox System: A MetaLogiQL Approach , 2015, Proc. VLDB Endow..

[39]  Milos Nikolic,et al.  LINVIEW: incremental view maintenance for complex analytical queries , 2014, SIGMOD Conference.

[40]  Jack J. Dongarra,et al.  Automatically Tuned Linear Algebra Software , 1998, Proceedings of the IEEE/ACM SC98 Conference.

[41]  Atri Rudra,et al.  Skew strikes back: new developments in the theory of join algorithms , 2013, SGMD.

[42]  Christopher Ré,et al.  Towards a unified architecture for in-RDBMS analytics , 2012, SIGMOD Conference.

[43]  Dan Olteanu,et al.  Factorized Databases , 2016, SGMD.

[44]  Milos Nikolic,et al.  How to Win a Hot Dog Eating Contest: Distributed Incremental View Maintenance with Batch Updates , 2016, SIGMOD Conference.

[45]  Ameet Talwalkar,et al.  MLlib: Machine Learning in Apache Spark , 2015, J. Mach. Learn. Res..