Proactive Scalability and Management of Resources in Hybrid Clouds via Machine Learning

In this paper, we present a novel framework for supporting the management and optimization of application subject to software anomalies and deployed on large scale cloud architectures, composed of different geographically distributed cloud regions. The framework uses machine learning models for predicting failures caused by accumulation of anomalies. It introduces a novel workload balancing approach and a proactive system scale up/scale down technique. We developed a prototype of the framework and present some experiments for validating the applicability of the proposed approaches.

[1]  Rajkumar Buyya,et al.  CloudSim: a toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms , 2011, Softw. Pract. Exp..

[2]  Stephen Dawson,et al.  Markovian Workload Characterization for QoS Prediction in the Cloud , 2011, 2011 IEEE 4th International Conference on Cloud Computing.

[3]  Wayne D. Smith,et al.  TPC-W: Benchmarking An Ecommerce Solution , 2001 .

[4]  Mikko H. Lipasti,et al.  Characterizing a Java Implementation of TPC-W , 1996 .

[5]  Archana Ganapathi,et al.  Statistics-driven workload modeling for the Cloud , 2010, 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010).

[6]  Aniruddha S. Gokhale,et al.  Efficient Autoscaling in the Cloud Using Predictive Models for Workload Forecasting , 2011, 2011 IEEE 4th International Conference on Cloud Computing.

[7]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machine Classifiers , 1999, Neural Processing Letters.

[8]  Rajkumar Buyya,et al.  Virtual Machine Provisioning Based on Analytical Performance and QoS in Cloud Computing Environments , 2011, 2011 International Conference on Parallel Processing.

[9]  Edward I. George,et al.  Extracting Representative Tree Models From a Forest , 1998 .

[10]  Domenico Cotroneo,et al.  Software Aging and Rejuvenation: Where We Are and Where We Are Going , 2011, 2011 IEEE Third International Workshop on Software Aging and Rejuvenation.

[11]  Aaron Vegh MySQL Database Server , 2011 .

[12]  Dimiter R. Avresky,et al.  Dynamic reconfiguration in computer clusters with irregular topologies in the presence of multiple node and link failures , 2005, IEEE Transactions on Computers.

[13]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[14]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[15]  Dimiter R. Avresky,et al.  A Machine Learning-Based Framework for Building Application Failure Prediction Models , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium Workshop.

[16]  Anthony T. Chronopoulos,et al.  A Resilient Hierarchical Distributed Loop Self-Scheduling Scheme for Cloud Systems , 2014, 2014 IEEE 13th International Symposium on Network Computing and Applications.

[17]  Soila Pertet,et al.  Causes of Failure in Web Applications (CMU-PDL-05-109) , 2005 .

[18]  Barbara Panicucci,et al.  Multi-timescale Distributed Capacity Allocation and Load Redirect Algorithms for Cloud System , 2011 .

[19]  Roberto Palmieri,et al.  A flexible framework for accurate simulation of cloud in-memory data stores , 2014, Simul. Model. Pract. Theory.

[20]  Ian H. Witten,et al.  Weka: Practical machine learning tools and techniques with Java implementations , 1999 .

[21]  Dimiter R. Avresky,et al.  Machine Learning for Achieving Self-* Properties and Seamless Execution of Applications in the Cloud , 2015, 2015 IEEE Fourth Symposium on Network Cloud Computing and Applications (NCCA).

[22]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[23]  Haifeng Chen,et al.  Proactive Workload Management in Hybrid Cloud Computing , 2014, IEEE Transactions on Network and Service Management.