Orchestrator conversation: Distributed management of cloud applications

Managing cloud applications is complex, and the current state of the art is not addressing this issue. The ever-growing software ecosystem continues to increase the knowledge required to manage cloud applications at a time when there is already an IT skills shortage. Solving this issue requires capturing IT operation knowledge in software so that this knowledge can be reused by system administrators who do not have it. The presented research tackles this issue by introducing a new and fundamentally different way to approach cloud application management: a hierarchical collection of independent software agents, collectively managing the cloud application. Each agent encapsulates knowledge of how to manage specific parts of the cloud application, is driven by sending and receiving cloud models, and collaborates with other agents by communicating using conversations. The entirety of communication and collaboration in this collection is called the orchestrator conversation. A thorough evaluation shows the orchestrator conversation makes it possible to encapsulate IT operations knowledge that current solutions cannot, reduces the complexity of managing a cloud application, and happens inherently concurrent. The evaluation also shows that the conversation figures out how to deploy a single big data cluster in less than 100 milliseconds, which scales linearly to less than 10 seconds for 100 clusters, resulting in a minimal overhead compared with the deployment time of at least 20 minutes with the state of the art.

[1]  Frank Leymann,et al.  Optimal Distribution of Applications in the Cloud , 2014, CAiSE.

[2]  Antonio Brogi,et al.  Finding available services in TOSCA-compliant clouds , 2016, Sci. Comput. Program..

[3]  Ignacio Blanquer,et al.  INDIGO-Datacloud: foundations and architectural description of a Platform as a Service oriented to scientific computing , 2016, ArXiv.

[4]  Werner Vogels,et al.  Building reliable distributed systems at a worldwide scale demands trade-offs between consistency and availability. , 2022 .

[5]  Antonio Brogi,et al.  DrACO: Discovering available cloud offerings , 2017, Computer Science - Research and Development.

[6]  Michael Abd-El-Malek,et al.  Omega: flexible, scalable schedulers for large compute clusters , 2013, EuroSys '13.

[7]  Jose M. Alcaraz Calero,et al.  Towards an architecture for deploying elastic services in the cloud , 2012, Softw. Pract. Exp..

[8]  Frank Leymann,et al.  Anything to Topology - A Method and System Architecture to Topologize Technology-specific Application Deployment Artifacts. , 2017, CLOSER 2017.

[9]  Fabienne Boyer,et al.  Reliable self‐deployment of distributed cloud applications , 2017, Softw. Pract. Exp..

[10]  Randy H. Katz,et al.  Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center , 2011, NSDI.

[11]  Antonio Brogi,et al.  A Petri Net-Based Approach to Model and Analyze the Management of Cloud Applications , 2016, Trans. Petri Nets Other Model. Concurr..

[12]  Eric A. Brewer,et al.  Borg, Omega, and Kubernetes , 2016, ACM Queue.

[13]  Rajiv Ranjan,et al.  Open Issues in Scheduling Microservices in the Cloud , 2016, IEEE Cloud Computing.

[14]  Bruno Volckaert,et al.  Model-driven deployment and management of workflows on analytics frameworks , 2016, 2016 IEEE International Conference on Big Data (Big Data).

[15]  Bruno Volckaert,et al.  Distributed Service Orchestration: Eventually Consistent Cloud Operation and Integration , 2016, 2016 IEEE International Conference on Mobile Services (MS).

[16]  Oliver Kopp,et al.  Declarative vs . Imperative : Two Modeling Patterns for the Automated Deployment of Applications , 2017 .

[17]  Eric Fabre,et al.  Empowering self-diagnosis with self-modeling , 2012, 2012 8th international conference on network and service management (cnsm) and 2012 workshop on systems virtualiztion management (svm).

[18]  Emmanuel Lavinal,et al.  A multi-agent self-adaptative management framework , 2009, Int. J. Netw. Manag..

[19]  Abhishek Verma,et al.  Large-scale cluster management at Google with Borg , 2015, EuroSys.

[20]  Frank Leymann,et al.  Anything to Topology - A Method and System Architecture to Topologize Technology-specific Application Deployment Artifacts , 2017, CLOSER.

[21]  Oliver Kopp,et al.  Combining Declarative and Imperative Cloud Application Provisioning Based on TOSCA , 2014, 2014 IEEE International Conference on Cloud Engineering.

[22]  Frank Leymann,et al.  Collaborative gathering and continuous delivery of DevOps solutions through repositories , 2017, Computer Science - Research and Development.

[23]  Antonio Rosales,et al.  Open big data infrastructures to everyone , 2016, 2016 IEEE International Conference on Big Data (Big Data).

[24]  Jin Shao,et al.  A Runtime Model Based Monitoring Approach for Cloud , 2010, 2010 IEEE 3rd International Conference on Cloud Computing.

[25]  Rajiv Ranjan,et al.  A Taxonomy and Survey of Cloud Resource Orchestration Techniques , 2017, ACM Comput. Surv..

[26]  Kai Sasaki,et al.  Ecosystem at Large: Hadoop with Apache Bigtop , 2016 .

[27]  Dana Petcu,et al.  DICE: Quality-Driven Development of Data-Intensive Cloud Applications , 2015, 2015 IEEE/ACM 7th International Workshop on Modeling in Software Engineering.

[28]  Gordon S. Blair,et al.  Models@ run.time , 2009, Computer.

[29]  Christopher B. Hauser,et al.  Experiences of models@run-time with EMF and CDO , 2016, SLE.

[30]  Miguel Goulão,et al.  Synergies and tradeoffs in software reuse – a systematic mapping study , 2017, Softw. Pract. Exp..

[31]  Roberto Di Cosmo,et al.  Aeolus: A component model for the cloud , 2014, Inf. Comput..

[32]  Noël Crespi,et al.  Self-modeling based diagnosis of network services over programmable networks , 2017, Int. J. Netw. Manag..