Reviewing Cloud Monitoring: Towards Cloud Resource Profiling

Cloud data centres with providers on physical level and customers on virtual level both monitor their hard-and software infrastructure to understand load patterns and to detect malfunctions and bottlenecks. The motivation for cloud monitoring on both the virtual and physical level can be summarized to the three occasions alerting, resource allocation, and visualization. Typical cloud monitoring solutions transfer and store all metrics of all systems under observation in central stores like time series databases. Applications then query, aggregate and compute their result out of these monitoring data. In large data centres, the amount of data scales up and leads to a reasonable overhead. In addition monitoring on virtual and physical level duplicates the overhead. We present an approach for monitoring resource statistics on the physical level only, and provide resource utilisation profiles to cloud middleware and customers, instead of storing the raw time series data. The approach first revisits the necessary metrics for hardware independent resource profiles, considering overbooked physical servers as well. A profile consists of a static (e.g. CPU cores) and dynamic (e.g. changing utilisation) part, and is based on statistical computations like histograms and Markov chains.

[1]  Chita R. Das,et al.  CloudPD: Problem determination and diagnosis in shared dynamic clouds , 2013, 2013 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).

[2]  Erik Elmroth,et al.  Decentralized cloud datacenter reconsolidation through emergent and topology-aware behavior , 2016, Future Gener. Comput. Syst..

[3]  Ji Su Park,et al.  Markov Chain Based Monitoring Service for Fault Tolerance in Mobile Cloud Computing , 2011, 2011 IEEE Workshops of International Conference on Advanced Information Networking and Applications.

[4]  Yijia Zhang,et al.  Diagnosing Performance Variations in HPC Applications Using Machine Learning , 2017, ISC.

[5]  Johan Tordsson,et al.  The CACTOS Vision of Context-Aware Cloud Topology Optimization and Simulation , 2014, 2014 IEEE 6th International Conference on Cloud Computing Technology and Science.

[6]  Antonio Pescapè,et al.  Cloud monitoring: A survey , 2013, Comput. Networks.

[7]  Rajiv Ranjan,et al.  Cloud monitoring for optimizing the QoS of hosted applications , 2012, 4th IEEE International Conference on Cloud Computing Technology and Science Proceedings.

[8]  Tao Feng,et al.  ODP: An Infrastructure for On-Demand Service Profiling , 2018, ICPE.

[9]  Xiaozhe Wang,et al.  Characteristic-Based Clustering for Time Series Data , 2006, Data Mining and Knowledge Discovery.

[10]  Jiawei Han,et al.  Modeling Probabilistic Measurement Correlations for Problem Determination in Large-Scale Distributed Systems , 2009, 2009 29th IEEE International Conference on Distributed Computing Systems.

[11]  Sunilkumar S. Manvi,et al.  Resource management for Infrastructure as a Service (IaaS) in cloud computing: A survey , 2014, J. Netw. Comput. Appl..

[12]  Christopher B. Hauser,et al.  Towards Usage-Based Dynamic Overbooking in IaaS Clouds , 2016, GECON.

[13]  Christopher B. Hauser,et al.  ViCE Registry: An Image Registry for Virtual Collaborative Environments , 2017, 2017 IEEE International Conference on Cloud Computing Technology and Science (CloudCom).

[14]  Carlos Becker Westphall,et al.  Cloud resource management: A survey on forecasting and profiling models , 2015, J. Netw. Comput. Appl..

[15]  Armando Fox,et al.  Fingerprinting the datacenter: automated classification of performance crises , 2010, EuroSys '10.

[16]  Ajith Abraham,et al.  Toward a lightweight framework for monitoring public clouds , 2012, 2012 Fourth International Conference on Computational Aspects of Social Networks (CASoN).

[17]  S. K. Nandy,et al.  Resource usage monitoring for KVM based virtual machines , 2012, 2012 18th International Conference on Advanced Computing and Communications (ADCOM).

[18]  Christopher B. Hauser,et al.  Dynamic Network Scheduler for Cloud Data Centres with SDN , 2017, UCC.

[19]  Christopher B. Hauser DisResc: self-organized and flexible distributed resource cluster , 2016 .