ModelDB: a system for machine learning model management

Building a machine learning model is an iterative process. A data scientist will build many tens to hundreds of models before arriving at one that meets some acceptance criteria (e.g. AUC cutoff, accuracy threshold). However, the current style of model building is ad-hoc and there is no practical way for a data scientist to manage models that are built over time. As a result, the data scientist must attempt to "remember" previously constructed models and insights obtained from them. This task is challenging for more than a handful of models and can hamper the process of sensemaking. Without a means to manage models, there is no easy way for a data scientist to answer questions such as "Which models were built using an incorrect feature?", "Which model performed best on American customers?" or "How did the two top models compare?" In this paper, we describe our ongoing work on ModelDB, a novel end-to-end system for the management of machine learning models. ModelDB clients automatically track machine learning models in their native environments (e.g. scikit-learn, spark.ml), the ModelDB backend introduces a common layer of abstractions to represent models and pipelines, and the ModelDB frontend allows visual exploration and analyses of models via a web-based interface.

[1]  Ben Shneiderman,et al.  The eyes have it: a task by data type taxonomy for information visualizations , 1996, Proceedings 1996 IEEE Symposium on Visual Languages.

[2]  Steven F. Roth,et al.  Enhancing data exploration with a branching history of user operations , 2001, Knowl. Based Syst..

[3]  A History Mechanism for Visual Data Mining , 2004, IEEE Symposium on Information Visualization.

[4]  Cláudio T. Silva,et al.  VisTrails: enabling interactive multiple-view visualizations , 2005, VIS 05. IEEE Visualization, 2005..

[5]  Cláudio T. Silva,et al.  VisTrails: visualization meets data management , 2006, SIGMOD Conference.

[6]  Cláudio T. Silva,et al.  Managing the Evolution of Dataflows with VisTrails , 2006, 22nd International Conference on Data Engineering Workshops (ICDEW'06).

[7]  Anthony C. Robinson,et al.  Re-Visualization: Interactive Visualization of the Process of Visual Analysis , 2006 .

[8]  Edward A. Lee,et al.  Scientific workflow management and the Kepler system , 2006, Concurr. Comput. Pract. Exp..

[9]  Jeffrey Heer,et al.  Graphical Histories for Visualization: Supporting Analysis, Communication, and Evaluation , 2008, IEEE Transactions on Visualization and Computer Graphics.

[10]  B. Shneiderman,et al.  Interactive Dynamics for Visual Analysis , 2012 .

[11]  Carole A. Goble,et al.  The Taverna workflow suite: designing and executing workflows of Web Services on the desktop, web or in the cloud , 2013, Nucleic Acids Res..

[12]  P. Pirolli,et al.  The Sensemaking Process and Leverage Points for Analyst Technology as Identified Through Cognitive Task Analysis , 2007 .