MODELDB: Opportunities and Challenges in Managing Machine Learning Models

Machine learning applications have become ubiquitous in a variety of domains. Powering each of these ML applications are one or more machine learning models that are used to make key decisions or compute key quantities. The life-cycle of an ML model starts with data processing, going on to feature engineering, model experimentation, deployment, and maintenance. We call the process of tracking a model across all phases of its life-cycle as model management. In this paper, we discuss the current need for model management and describe MODELDB, the first open-source model management system developed at MIT. We also discuss the changing landscape and growing challenges and opportunities in managing models.

[1]  Ryan R. Curtin,et al.  Detecting Adversarial Samples from Artifacts , 2017, ArXiv.

[2]  Wen-Ching Lin,et al.  PMML in Action: Unleashing the Power of Open Standards for Data Mining and Predictive Analytics , 2010 .

[3]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[4]  Joseph M. Hellerstein,et al.  Ground: A Data Context Service , 2017, CIDR.

[5]  Jascha Sohl-Dickstein,et al.  SVCCA: Singular Vector Canonical Correlation Analysis for Deep Learning Dynamics and Interpretability , 2017, NIPS.

[6]  Jeffrey F. Naughton,et al.  Model Selection Management Systems: The Next Frontier of Advanced Analytics , 2016, SGMD.

[7]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[8]  Carole A. Goble,et al.  The Taverna workflow suite: designing and executing workflows of Web Services on the desktop, web or in the cloud , 2013, Nucleic Acids Res..

[9]  Amol Deshpande,et al.  ProvDB: Lifecycle Management of Collaborative Analysis Workflows , 2017, HILDA@SIGMOD.

[10]  W. B. Roberts,et al.  Machine Learning: The High Interest Credit Card of Technical Debt , 2014 .

[11]  Manasi Vartak,et al.  ModelDB: a system for machine learning model management , 2016, HILDA '16.

[12]  Cláudio T. Silva,et al.  VisTrails: visualization meets data management , 2006, SIGMOD Conference.

[13]  Edward A. Lee,et al.  Scientific workflow management and the Kepler system , 2006, Concurr. Comput. Pract. Exp..

[14]  Cláudio T. Silva,et al.  Managing the Evolution of Dataflows with VisTrails , 2006, 22nd International Conference on Data Engineering Workshops (ICDEW'06).

[15]  Been Kim,et al.  Towards A Rigorous Science of Interpretable Machine Learning , 2017, 1702.08608.

[16]  Larry S. Davis,et al.  ModelHub: Deep Learning Lifecycle Management , 2017, 2017 IEEE 33rd International Conference on Data Engineering (ICDE).

[17]  Cláudio T. Silva,et al.  VisTrails: enabling interactive multiple-view visualizations , 2005, VIS 05. IEEE Visualization, 2005..

[18]  Agustí Verde Parera,et al.  General data protection regulation , 2018 .