Practical Aspects for Effective Monitoring of SLAs in Cloud Computing and Virtual Platforms

Cloud computing is transforming the software landscape. Software services are increasingly designed in modular and decoupled fashion that communicate over a network and are deployed on the Cloud. Cloud offers three service models namely Infrastructure-as-a-Service (IaaS), Platform-as-a-Service (PaaS), and Softwareas-a-Service (SaaS). Although this allows better management of resources, the Quality of Service (QoS) in dynamically changing environments like Cloud must be legally stipulated as a Service Level Agreement (SLA). This introduces several challenges in the area of SLA enforcement. A key problem is detecting the root cause of performance problems which may lie in hosted service or deployment platforms (PaaS or IaaS), and adjusting resources accordingly. Monitoring and Analytic methods have emerged as promising and inevitable solutions in this context, but require precise real time monitoring data. Towards this goal, we assess practical aspects for effective monitoring of SLA-aware services hosted in Cloud. We present two real-world application scenarios for deriving requirements and present the prototype of our Monitoring and Analytics framework. We claim that this work provides necessary foundations for researching SLA-aware root cause analysis algorithms under realistic setup.

[1]  Xiaohui Gu,et al.  PREPARE: Predictive Performance Anomaly Prevention for Virtualized Cloud Systems , 2012, 2012 IEEE 32nd International Conference on Distributed Computing Systems.

[2]  Manish Gupta,et al.  Problem Determination Using Dependency Graphs and Run-Time Behavior Models , 2004, DSOM.

[3]  David E. Culler,et al.  The ganglia distributed monitoring system: design, implementation, and experience , 2004, Parallel Comput..

[4]  Boris Gruschke,et al.  INTEGRATED EVENT MANAGEMENT: EVENT CORRELATION USING DEPENDENCY GRAPHS , 1998 .

[5]  Chita R. Das,et al.  CloudPD: Problem determination and diagnosis in shared dynamic clouds , 2013, 2013 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).

[6]  Eric A. Brewer,et al.  Pinpoint: problem determination in large, dynamic Internet services , 2002, Proceedings International Conference on Dependable Systems and Networks.

[7]  Roland Kübert,et al.  Building a Service-Oriented Monitoring Framework with REST and Nagios , 2011, 2011 IEEE International Conference on Services Computing.

[8]  Andreas Hanemann Automated IT service fault diagnosis based on event correlation techniques , 2007 .

[9]  Antonio Pescapè,et al.  Cloud monitoring: Definitions, issues and future directions , 2012, 2012 IEEE 1st International Conference on Cloud Networking (CLOUDNET).

[10]  Xiaoyun Zhu,et al.  DAPA: Diagnosing Application Performance Anomalies for Virtualized Infrastructures , 2012, Hot-ICE.

[11]  Hanan Lutfiyya,et al.  Diagnosing quality of service faults in distributed applications , 2002, Conference Proceedings of the IEEE International Performance, Computing, and Communications Conference (Cat. No.02CH37326).

[12]  Augustín Escámez Chimeno,et al.  A Generic Platform for Conducting SLA Negotiations , 2011 .

[13]  R. O’Hara,et al.  A review of Bayesian variable selection methods: what, how and which , 2009 .

[14]  Jimeng Sun,et al.  InteMon: continuous mining of sensor data in large-scale self-infrastructures , 2006, OPSR.

[15]  Ciprian Dobre,et al.  MonALISA: An agent based, dynamic service system to monitor, control and optimize distributed systems , 2009, Comput. Phys. Commun..

[16]  Salvatore Venticinque,et al.  Cloud Application Monitoring: The mOSAIC Approach , 2011, 2011 IEEE Third International Conference on Cloud Computing Technology and Science.

[17]  Ramin Yahyapour,et al.  A Protocol Development Framework for SLA Negotiations in Cloud and Service Computing , 2012, GECON.

[18]  Nico d'Heureuse,et al.  Towards holistic multi-tenant monitoring for virtual data centers , 2010, 2010 IEEE/IFIP Network Operations and Management Symposium Workshops.

[19]  Richard Mortier,et al.  Magpie: Online Modelling and Performance-aware Systems , 2003, HotOS.

[20]  Haifeng Chen,et al.  PeerWatch: a fault detection and diagnosis tool for virtualized consolidation systems , 2010, ICAC '10.

[21]  Malgorzata Steinder,et al.  Yemanja-a layered event correlation engine for multi-domain server farms , 2001, 2001 IEEE/IFIP International Symposium on Integrated Network Management Proceedings. Integrated Network Management VII. Integrated Management Strategies for the New Millennium (Cat. No.01EX470).