Automated Scalability of Cloud Services and Jobs

Many scientific and commercial applications require access to computation, data or networking resources based on dynamically changing requirements. Users and providers both require these applications or services to dynamically adjust to fluctuations in demand and serve end-users at required quality of service (performance, reliability, security, etc.) and at optimized cost. This may require resources of these applications or services to automatically scale up or down. The European funded COLA (Cloud Orchestration at the Level of Application) project aims to design and develop a generic framework that supports automated scalability of a large variety of applications. Learning from previous similar efforts and with the aim of reusing existing open source technologies wherever possible, COLA elaborated a modular architecture called MiCADO (Microservices-based Cloud Application-level Dynamic Orchestrator) [1] that provides optimized deployment and run-time orchestration for cloud applications. MiCADO is built from well-defined building blocks implemented as microservices. This modular design supports various implementations where components can be replaced relatively easily with alternative technologies. The generic, technology independent architecture diagram of MiCADO is represented in Figure 1. Building blocks, both on the MiCADO Master and also on the MiCADO Worker Nodes are implemented as microservices. The current implementation uses widely applied technologies, such as Docker Swarm as Container Orchestrator [2], Occopus as Cloud Orchestrator [3], and Prometheus [4] as the Monitoring System. The user facing interface of MiCADO is a TOSCA (Topology and Orchestration Specification for Cloud Applications, an OASIS standard) [5] based description of the desired topology and its associated scalability and security policies. This interface can then be embedded to existing GUIs, custom web interfaces or science gateways. The first prototype implementations of MiCADO show promising results on various application types. The two main targeted application categories are cloud-based services where scalability is achieved by scaling up or down the number of containers and virtual machines based on load, performance and cost, and the execution of a large number of (typically parameter sweep style) jobs where a certain number of these jobs need to be executed by a set deadline. Direct involvement of industry partners assures that the results of COLA are prototyped on real application scenarios. Three near production quality demonstrators and twenty further proof of concept case studies are being implemented using MiCADO and demonstrating its applicability in case of both service and job type scalability. Some of the applications prototyped are directly related to services utilized in science gateways, such as the Data Avenue service of WS-PGRADE [6].