Deep-water framework: The Swiss army knife of humans working with machine learning models

Abstract Working with machine learning models has become an everyday task not only for software engineers, but for a much wider spectrum of researchers and professionals. Training such models involves finding the best learning methods and their best hyper-parameters for a specific task, keeping track of the achieved performance measures, comparing the results visually, etc. If we add feature extraction methods – that precede the learning phase and depend on many hyper-parameters themselves – into the mixture, like source code embedding that is quite common in the field of software analysis, the task cries out for supporting tools. We propose a framework called Deep-Water that works similarly to a configuration management tool in the area of software engineering. It supports defining arbitrary feature extraction and learning methods for an input dataset and helps in executing all the training tasks with different hyper-parameters in a distributed manner. The framework stores all circumstances, parameters and results of training, which can be filtered and visualized later. We successfully used the tool in several software analysis based prediction tasks, like vulnerability or bug prediction, but it is general enough to be applicable in other areas as well, e.g. NLP, image processing, or even other non-IT fields.

[1]  J. Schmidhuber,et al.  The Sacred Infrastructure for Computational Research , 2017, SciPy.

[2]  Tibor Gyimóthy,et al.  A Public Unified Bug Dataset for Java , 2018, PROMISE.

[3]  Leila Etaati,et al.  Azure Machine Learning Studio , 2019, Machine Learning with Microsoft Technologies.

[4]  Tibor Gyimóthy,et al.  Challenging Machine Learning Algorithms in Predicting Vulnerable JavaScript Functions , 2019, 2019 IEEE/ACM 7th International Workshop on Realizing Artificial Intelligence Synergies in Software Engineering (RAISE).

[5]  Marcus Liwicki,et al.  DeepDIVA: A Highly-Functional Python Framework for Reproducible Experiments , 2018, 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[6]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[7]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.