Incremental Maintenance of Regression Models over Joins

This paper introduces a principled incremental view maintenance (IVM) mechanism for in-database computation described by rings. We exemplify our approach by introducing the covariance matrix ring that we use for learning linear regression models over arbitrary equi-join queries. Our approach is a higher-order IVM algorithm that exploits the factorized structure of joins and aggregates to avoid redundant computation and improve performance. We implemented it in DBToaster, which uses program synthesis to generate high-performance maintenance code. We experimentally show that it can outperform first-order and fully recursive higher-order IVM as well as recomputation by orders of magnitude while using less memory.

[1]  Ameet Talwalkar,et al.  MLlib: Machine Learning in Apache Spark , 2015, J. Mach. Learn. Res..

[2]  Carl Kesselman,et al.  Concepts and Architecture , 2004, The Grid 2, 2nd Edition.

[3]  Tim Kraska,et al.  Machine Learning and Databases: The Sound of Things to Come or a Cacophony of Hype? , 2015, SIGMOD Conference.

[4]  Florin Rusu,et al.  Speculative Approximations for Terascale Distributed Gradient Descent Optimization , 2015, DanaC@SIGMOD.

[5]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[6]  Jakub Závodný,et al.  Size Bounds for Factorised Representations of Query Results , 2015, TODS.

[7]  Milos Nikolic,et al.  DBToaster: Higher-order Delta Processing for Dynamic, Frequently Fresh Views , 2012, Proc. VLDB Endow..

[8]  Atri Rudra,et al.  Skew strikes back: new developments in the theory of join algorithms , 2013, SGMD.

[9]  Shirish Tatikonda,et al.  Resource Elasticity for Large-Scale Machine Learning , 2015, SIGMOD Conference.

[10]  Christopher Ré,et al.  Towards a unified architecture for in-RDBMS analytics , 2012, SIGMOD Conference.

[11]  Atri Rudra,et al.  FAQ: Questions Asked Frequently , 2015, PODS.

[12]  Dan Olteanu,et al.  Factorized Databases , 2016, SGMD.

[13]  Milos Nikolic,et al.  How to Win a Hot Dog Eating Contest: Distributed Incremental View Maintenance with Batch Updates , 2016, SIGMOD Conference.

[14]  Dan Olteanu,et al.  Learning Linear Regression Models over Factorized Joins , 2016, SIGMOD Conference.

[15]  Ying Xing,et al.  The Design of the Borealis Stream Processing Engine , 2005, CIDR.

[16]  Jeffrey F. Naughton,et al.  Learning Generalized Linear Models Over Normalized Data , 2015, SIGMOD Conference.

[17]  Paul G. Brown,et al.  Overview of sciDB: large scale array storage, processing and analysis , 2010, SIGMOD Conference.

[18]  Jakub Závodný,et al.  Aggregation and Ordering in Factorised Databases , 2013, Proc. VLDB Endow..

[19]  Wei Hong,et al.  TinyDB: an acquisitional query processing system for sensor networks , 2005, TODS.

[20]  Nicole Schweikardt,et al.  Answering Conjunctive Queries under Updates , 2017, PODS.

[21]  Shai Shalev-Shwartz,et al.  Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..

[22]  Rada Chirkova,et al.  Materialized Views , 2012, Found. Trends Databases.

[23]  Badrish Chandramouli,et al.  Trill: A High-Performance Incremental Query Processor for Diverse Analytics , 2014, Proc. VLDB Endow..

[24]  Paul Mineiro,et al.  Machine learning for big data , 2013, SIGMOD '13.

[25]  Luis Leopoldo Perez,et al.  A comparison of platforms for implementing and running very large scale machine learning algorithms , 2014, SIGMOD Conference.

[26]  Dániel Marx,et al.  Size Bounds and Query Plans for Relational Joins , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[27]  Yannis Papakonstantinou,et al.  Utilizing IDs to Accelerate Incremental View Maintenance , 2015, SIGMOD Conference.

[28]  Christoph Koch,et al.  Incremental query evaluation in a ring of databases , 2010, PODS.

[29]  Kun Li,et al.  The MADlib Analytics Library or MAD Skills, the SQL , 2012, Proc. VLDB Endow..

[30]  Shirish Tatikonda,et al.  Hybrid Parallelization Strategies for Large-Scale Machine Learning in SystemML , 2014, Proc. VLDB Endow..

[31]  Steffen Rendle Scaling Factorization Machines to Relational Data , 2013, Proc. VLDB Endow..

[32]  Ryan Johnson,et al.  Processing Analytical Workloads Incrementally , 2015, ArXiv.

[33]  Emir Pasalic,et al.  Design and Implementation of the LogicBlox System , 2015, SIGMOD Conference.

[34]  Dan Olteanu,et al.  F: Regression Models over Factorized Views , 2016, Proc. VLDB Endow..

[35]  Berthold Reinwald,et al.  Efficient sample generation for scalable meta learning , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[36]  Todd J. Green,et al.  Live Programming in the LogicBlox System: A MetaLogiQL Approach , 2015, Proc. VLDB Endow..

[37]  Milos Nikolic,et al.  LINVIEW: incremental view maintenance for complex analytical queries , 2014, SIGMOD Conference.