Machine Learning for Achieving Self-* Properties and Seamless Execution of Applications in the Cloud

Software anomalies are recognized as a major problem affecting the performance and availability of many computer systems. Accumulation of anomalies of different nature, such as memory leaks and unterminated threads, may lead the system to both fail or work with suboptimal performance levels. This problem particularly affects web servers, where hosted applications are typically intended to continuously run, thus incrementing the probability, therefore the associated effects, of accumulation of anomalies. Given the unpredictability of occurrence of anomalies, continuous system monitoring would be required to detect possible system failures and/or excessive performance degradation in order to timely start some recovering procedure. In this paper, we present a Machine Learning-based framework for proactive management of client-server applications in the cloud. Through optimized Machine Learning models and continually measuring system features, the framework predicts the remaining time to the occurrence of some unexpected event (system failure, service level agreement violation, etc.) of a virtual machine hosting a server instance of the application. The framework is able to manage virtual machines in the presence of different types anomalies and with different anomaly occurrence patterns. We show the effectiveness of the proposed solution by presenting results of a set of experiments we carried out in the context of a real world-inspired scenario.

[1]  Christoph Meinel,et al.  Infrastructure as a service security: Challenges and solutions , 2010, 2010 The 7th International Conference on Informatics and Systems (INFOS).

[2]  Jordi Torres,et al.  Using Virtualization to Improve Software Rejuvenation , 2007, IEEE Transactions on Computers.

[3]  Soila Pertet,et al.  Causes of Failure in Web Applications (CMU-PDL-05-109) , 2005 .

[4]  Edward I. George,et al.  Extracting Representative Tree Models From a Forest , 1998 .

[5]  David M Levinson,et al.  Lecture Notes of the Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering , 2009, Complex.

[6]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machine Classifiers , 1999, Neural Processing Letters.

[7]  Dimiter R. Avresky,et al.  A Machine Learning-Based Framework for Building Application Failure Prediction Models , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium Workshop.

[8]  Mikko H. Lipasti,et al.  Characterizing a Java Implementation of TPC-W , 1996 .

[9]  Anand Sivasubramaniam,et al.  Critical event prediction for proactive management in large-scale computer clusters , 2003, KDD '03.

[10]  Ian H. Witten,et al.  Weka: Practical machine learning tools and techniques with Java implementations , 1999 .

[11]  Yennun Huang,et al.  Software rejuvenation: analysis, module and applications , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[12]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[13]  Wayne D. Smith,et al.  TPC-W: Benchmarking An Ecommerce Solution , 2001 .

[14]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[15]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[16]  Dimiter R. Avresky,et al.  Proactive Software Rejuvenation Based on Machine Learning Techniques , 2009, CloudComp.