Model-driven deployment and management of workflows on analytics frameworks

The data science skills shortage means that those who have the knowledge are under constant pressure to do more with less. While the data science tools are improving at a staggering pace, the operational tools around them can not keep up. Even researchers at Google state that the issue of automatic configuration and dependency management of services is still an “open, hard problem”. This manifests itself in data scientists either constantly having to solve operational challenges or having to be in constant close collaboration with a skilled operations team. This paper addresses the operational challenges behind deploying and managing workflows on top of analytics platforms by starting from three key requirements: data scientists want to model their workflows in a reusable way, this model should be automatically deployed, managed and connected to other services, and this solution should be compatible with existing cloud modeling languages, infrastructure, analytics platforms and tools. The paper explores where the state-of-the-art falls short in meeting these requirements, proposes an architecture to solve the open challenges, and implements and evaluates this architecture.

[1]  Elisabetta Di Nitto,et al.  Towards a Model-Driven Design Tool for Big Data Architectures , 2016, 2016 IEEE/ACM 2nd International Workshop on Big Data Software Engineering (BIGDSE).

[2]  Eric A. Brewer,et al.  Borg, Omega, and Kubernetes , 2016, ACM Queue.

[3]  Hui Song,et al.  Continous deployment of multi-cloud systems , 2015, QUDOS@SIGSOFT FSE.

[4]  Jay Kreps,et al.  Kafka : a Distributed Messaging System for Log Processing , 2011 .

[5]  Paul Watson,et al.  Towards Automated Workflow Deployment in the Cloud Using TOSCA , 2015, 2015 IEEE 8th International Conference on Cloud Computing.

[6]  Bruno Volckaert,et al.  Distributed Service Orchestration: Eventually Consistent Cloud Operation and Integration , 2016, 2016 IEEE International Conference on Mobile Services (MS).

[7]  K. Chandrasekaran,et al.  Stormgen - A Domain specific Language to create ad-hoc Storm Topologies , 2014, 2014 Federated Conference on Computer Science and Information Systems.