Cloudy with high chance of DBMS: a 10-year prediction for Enterprise-Grade ML

Machine learning (ML) has proven itself in high-value web applications such as search ranking and is emerging as a powerful tool in a much broader range of enterprise scenarios including voice recognition and conversational understanding for customer support, autotuning for videoconferencing, intelligent feedback loops in large-scale sysops, manufacturing and autonomous vehicle management, complex financial predictions, just to name a few. Meanwhile, as the value of data is increasingly recognized and monetized, concerns about securing valuable data and risks to individual privacy have been growing. Consequently, rigorous data management has emerged as a key requirement in enterprise settings. How will these trends (ML growing popularity, and stricter data governance) intersect? What are the unmet requirements for applying ML in enterprise settings? What are the technical challenges for the DB community to solve? In this paper, we present our vision of how ML and database systems are likely to come together, and early steps we take towards making this vision a reality.

[1]  Alekh Jindal,et al.  Query and Resource Optimization: Bridging the Gap , 2018, 2018 IEEE 34th International Conference on Data Engineering (ICDE).

[2]  Deepak Agarwal,et al.  Online Models for Content Optimization , 2008, NIPS.

[3]  Tilmann Rabl,et al.  An Intermediate Representation for Optimizing Machine Learning Pipelines , 2019, Proc. VLDB Endow..

[4]  David Duvenaud,et al.  Neural Ordinary Differential Equations , 2018, NeurIPS.

[5]  Xin Zhang,et al.  TFX: A TensorFlow-Based Production-Scale Machine Learning Platform , 2017, KDD.

[6]  Carlo Curino,et al.  Hydra: a federated resource manager for data-center scale analytics , 2019, NSDI.

[7]  Vishu Krishnamurthy,et al.  Performance Challenges in Object-Relational DBMSs , 1999, IEEE Data Eng. Bull..

[8]  Xin Wang,et al.  Clipper: A Low-Latency Online Prediction Serving System , 2016, NSDI.

[9]  Haichen Shen,et al.  TVM: An Automated End-to-End Optimizing Compiler for Deep Learning , 2018, OSDI.

[10]  Microsoft,et al.  Machine Learning at Microsoft with ML . NET , 2018 .

[11]  Úlfar Erlingsson,et al.  The Secret Sharer: Evaluating and Testing Unintended Memorization in Neural Networks , 2018, USENIX Security Symposium.

[12]  Srikanth Kandula,et al.  Netco: Cache and I/O Management for Analytics over Disaggregated Stores , 2018, SoCC.

[13]  David J. DeWitt,et al.  The Object-Oriented Database System Manifesto , 1994, Building an Object-Oriented Database System, The Story of O2.

[14]  Byung-Gon Chun,et al.  PRETZEL: Opening the Black Box of Machine Learning Prediction Serving Systems , 2018, OSDI.

[15]  Alexander Sergeev,et al.  Horovod: fast and easy distributed deep learning in TensorFlow , 2018, ArXiv.

[16]  Sriram Rao,et al.  Dhalion: Self-Regulating Stream Processing in Heron , 2017, Proc. VLDB Endow..

[17]  Seunghak Lee,et al.  Exploiting Bounded Staleness to Speed Up Big Data Analytics , 2014, USENIX Annual Technical Conference.

[18]  Daniela Florescu,et al.  Storing and Querying XML Data using an RDMBS , 1999, IEEE Data Eng. Bull..

[19]  Christopher Olston,et al.  TensorFlow-Serving: Flexible, High-Performance ML Serving , 2017, ArXiv.

[20]  Alekh Jindal,et al.  Query and Resource Optimizations: A Case for Breaking the Wall in Big Data Systems , 2018, ICDE 2018.

[21]  Kun Li,et al.  The MADlib Analytics Library or MAD Skills, the SQL , 2012, Proc. VLDB Endow..

[22]  Carlo Curino,et al.  Extending Relational Query Processing with ML Inference , 2019, CIDR.

[23]  Carlo Curino,et al.  Morpheus: Towards Automated SLOs for Enterprise Clusters , 2016, OSDI.

[24]  Carlo Curino,et al.  Data Science through the looking glass and what we found there , 2019, ArXiv.

[25]  Lars Kotthoff,et al.  Automated Machine Learning: Methods, Systems, Challenges , 2019, The Springer Series on Challenges in Machine Learning.

[26]  Kwanghyun Park,et al.  Froid: Optimization of Imperative Programs in a Relational Database , 2017, Proc. VLDB Endow..

[27]  Dan Suciu,et al.  LaraDB: A Minimalist Kernel for Linear and Relational Algebra Computation , 2017, BeyondMR@SIGMOD.

[28]  Hiren Patel,et al.  Towards a Learning Optimizer for Shared Clouds , 2018, Proc. VLDB Endow..

[29]  Kunle Olukotun,et al.  LevelHeaded: A Unified Engine for Business Intelligence and Linear Algebra Querying , 2018, 2018 IEEE 34th International Conference on Data Engineering (ICDE).