On Challenges in Machine Learning Model Management

The training, maintenance, deployment, monitoring, organization and documentation of machine learning (ML) models – in short model management – is a critical task in virtually all production ML use cases. Wrong model management decisions can lead to poor performance of a ML system and can result in high maintenance cost. As both research on infrastructure as well as on algorithms is quickly evolving, there is a lack of understanding of challenges and best practices for ML model management. Therefore, this field is receiving increased attention in recent years, both from the data management as well as from the ML community. In this paper, we discuss a selection of ML use cases, develop an overview over conceptual, engineering, and data-processing related challenges arising in the management of the corresponding ML models, and point out future research directions.

[1]  Zhao Zhang,et al.  Diagnosing Machine Learning Pipelines with Fine-grained Lineage , 2017, HPDC.

[2]  Sebastian Schelter,et al.  Automatically Tracking Metadata and Provenance of Machine Learning Experiments , 2017 .

[3]  Luís Torgo,et al.  OpenML: networked science in machine learning , 2014, SKDD.

[4]  Jeffrey F. Naughton,et al.  Model Selection Management Systems: The Next Frontier of Advanced Analytics , 2016, SGMD.

[5]  Tilmann Rabl,et al.  BlockJoin: Efficient Matrix Partitioning Through Joins , 2017, Proc. VLDB Endow..

[6]  D. Sculley,et al.  The Data Linter: Lightweight Automated Sanity Checking for ML Data Sets , 2017 .

[7]  Neoklis Polyzotis,et al.  Data Management Challenges in Production Machine Learning , 2017, SIGMOD Conference.

[8]  Felix Bießmann,et al.  "Deep" Learning for Missing Value Imputationin Tables with Non-Numerical Data , 2018, CIKM.

[9]  Matthias W. Seeger,et al.  Bayesian Intermittent Demand Forecasting for Large Inventories , 2016, NIPS.

[10]  Frank Hutter,et al.  Initializing Bayesian Hyperparameter Optimization via Meta-Learning , 2015, AAAI.

[11]  Valentin Flunkert,et al.  DeepAR: Probabilistic Forecasting with Autoregressive Recurrent Networks , 2017, International Journal of Forecasting.

[12]  Christoph Boden,et al.  Distributed Machine Learning-but at what COST ? , 2017 .

[13]  D. Sculley,et al.  Hidden Technical Debt in Machine Learning Systems , 2015, NIPS.

[14]  Michael Stonebraker,et al.  Data Integration: The Current Status and the Way Forward , 2018, IEEE Data Eng. Bull..

[15]  Ameet Talwalkar,et al.  MLlib: Machine Learning in Apache Spark , 2015, J. Mach. Learn. Res..

[16]  Rob J Hyndman,et al.  Forecasting with Exponential Smoothing: The State Space Approach , 2008 .

[17]  D. Sculley,et al.  What’s your ML test score? A rubric for ML production systems , 2016 .

[18]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[19]  Manasi Vartak,et al.  ModelDB: a system for machine learning model management , 2016, HILDA '16.

[20]  P. Young,et al.  Time series analysis, forecasting and control , 1972, IEEE Transactions on Automatic Control.

[21]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[22]  Volker Markl,et al.  Bridging the gap: towards optimization across linear and relational algebra , 2016, BeyondMR@SIGMOD.

[23]  John Pavlopoulos,et al.  Deeper Attention to Abusive User Content Moderation , 2017, EMNLP.

[24]  Xin Wang,et al.  Clipper: A Low-Latency Online Prediction Serving System , 2016, NSDI.

[25]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[26]  Zheng Zhang,et al.  MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems , 2015, ArXiv.

[27]  Michael Isard,et al.  Scalability! But at what COST? , 2015, HotOS.

[28]  Samuel Madden,et al.  MISTIQUE: A System to Store and Query Model Intermediates for Model Diagnosis , 2018, SIGMOD Conference.

[29]  Benjamin Letham,et al.  Forecasting at Scale , 2018, PeerJ Prepr..

[30]  Joos-Hendrik Böse,et al.  Probabilistic Demand Forecasting at Scale , 2017, Proc. VLDB Endow..

[31]  Michal Zielinski,et al.  Versioning for End-to-End Machine Learning Pipelines , 2017, DEEM@SIGMOD.

[32]  Felix Bießmann,et al.  Automating Large-Scale Data Quality Verification , 2018, Proc. VLDB Endow..

[33]  Xin Zhang,et al.  TFX: A TensorFlow-Based Production-Scale Machine Learning Platform , 2017, KDD.

[34]  Christos Faloutsos,et al.  Forecasting Big Time Series: Old and New , 2018, Proc. VLDB Endow..

[35]  Benjamin Recht,et al.  KeystoneML: Optimizing Pipelines for Large-Scale Advanced Analytics , 2016, 2017 IEEE 33rd International Conference on Data Engineering (ICDE).