AME-WPC: Advanced model for efficient workload prediction in the cloud

Abstract Workload estimation and prediction has become a very relevant research area in the field of cloud computing. The reason lies in its many benefits, which include QoS (Quality of Service) satisfaction, automatic resource scaling, and job/task scheduling. It is very difficult to accurately predict the workload of cloud applications if they are varying drastically. To address this issue, existing solutions use either statistical methods, which effectively detect repeating patterns but provide poor accuracy for long-term predictions, or learning methods, which develop a complex prediction model but are mostly unable to detect unusual patterns. Some solutions use a combination of both methods. However, none of them address the issue of gathering system-specific information in order to improve prediction accuracy. We propose an Advanced Model for Efficient Workload Prediction in the Cloud (AME-WPC), which combines statistical and learning methods, improves accuracy of workload prediction for cloud computing applications and can be dynamically adapted to a particular system. The learning methods use an extended training dataset, which we define through the analysis of the system factors that have a strong influence on the application workload. We address the workload prediction problem with classification as well as regression and test our solution with the machine-learning method Random Forest on both – basic and extended – training data. To evaluate our proposed model, we compare empirical tests with the machine-learning method kNN (k-Nearest Neighbors). Experimental results demonstrate that combining statistical and learning methods makes sense and can significantly improve prediction accuracy of workload over time.

[1]  Jerome A. Rolia,et al.  Workload Analysis and Demand Prediction of Enterprise Data Center Applications , 2007, 2007 IEEE 10th International Symposium on Workload Characterization.

[2]  Zhenhuan Gong,et al.  PRESS: PRedictive Elastic ReSource Scaling for cloud systems , 2010, 2010 International Conference on Network and Service Management.

[3]  Ruay-Shiung Chang,et al.  A Predictive Method for Workload Forecasting in the Cloud Environment , 2013, EMC/HumanCom.

[4]  Alicia Troncoso Lora,et al.  Time-Series Prediction: Application to the Short-Term Electric Energy Demand , 2003, CAEPIA.

[5]  Ruhi Sarikaya,et al.  Runtime workload behavior prediction using statistical metric modeling with application to dynamic power management , 2010, IEEE International Symposium on Workload Characterization (IISWC'10).

[6]  Barbara Panicucci,et al.  Multi-timescale Distributed Capacity Allocation and Load Redirect Algorithms for Cloud System , 2011 .

[7]  Emmanouel A. Varvarigos,et al.  Adjusted fair scheduling and non-linear workload prediction for QoS guarantees in grid computing , 2007, Comput. Commun..

[8]  Stephen A. Jarvis,et al.  An Investigation into the Application of Different Performance Prediction Methods to Distributed Enterprise Applications , 2005, The Journal of Supercomputing.

[9]  Eddy Caron,et al.  Forecasting for Grid and Cloud Computing On-Demand Resources Based on Pattern Matching , 2010, 2010 IEEE Second International Conference on Cloud Computing Technology and Science.

[10]  Rajkumar Buyya,et al.  Article in Press Future Generation Computer Systems ( ) – Future Generation Computer Systems Cloud Computing and Emerging It Platforms: Vision, Hype, and Reality for Delivering Computing as the 5th Utility , 2022 .

[11]  Guoqiang Peter Zhang,et al.  Time series forecasting using a hybrid ARIMA and neural network model , 2003, Neurocomputing.

[12]  Aniruddha S. Gokhale,et al.  Efficient Autoscaling in the Cloud Using Predictive Models for Workload Forecasting , 2011, 2011 IEEE 4th International Conference on Cloud Computing.

[13]  John T. Mentzer,et al.  Sales Forecasting Management , 2005 .

[14]  Jano I. van Hemert,et al.  Managing dynamic enterprise and urgent workloads on clouds using layered queuing and historical performance models , 2011, Simul. Model. Pract. Theory.

[15]  Neil Davey,et al.  Time Series Prediction and Neural Networks , 2001, J. Intell. Robotic Syst..

[16]  Meng Chang Chen,et al.  A Workload Analysis of Live Event Broadcast Service in Cloud , 2013, ANT/SEIT.

[17]  Eddy Caron,et al.  Forecasting for Cloud computing on-demand resources based on pattern matching , 2010 .

[18]  Yuan-Chun Jiang,et al.  A novel statistical time-series pattern based interval forecasting strategy for activity durations in workflow systems , 2011, J. Syst. Softw..

[19]  BuyyaRajkumar,et al.  Cloud computing and emerging IT platforms , 2009 .

[20]  Eddy Caron,et al.  Pattern Matching Based Forecast of Non-periodic Repetitive Behavior for Cloud Clients , 2011, Journal of Grid Computing.

[21]  Alexandru Iosup,et al.  The Grid Workloads Archive , 2008, Future Gener. Comput. Syst..

[22]  Eric Jones,et al.  SciPy: Open Source Scientific Tools for Python , 2001 .

[23]  Mohammad Kazem Akbari,et al.  Grid performance prediction using state‐space model , 2009, Concurr. Comput. Pract. Exp..

[24]  Marko Robnik,et al.  Improving Random Forests , 2004 .

[25]  Jano I. van Hemert,et al.  Resource management of enterprise cloud systems using layered queuing and historical performance models , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).

[26]  Mohamed Chtourou,et al.  Hierarchical neural networks based prediction and control of dynamic reconfiguration for multilevel embedded systems , 2013, J. Syst. Archit..

[27]  Sheng-Tun Li,et al.  A Stochastic HMM-Based Forecasting Model for Fuzzy Time Series , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[28]  Jiwen Dong,et al.  Time-series forecasting using flexible neural tree model , 2005, Inf. Sci..

[29]  Andreas S. Weigend,et al.  Time Series Prediction: Forecasting the Future and Understanding the Past , 1994 .

[30]  Lijuan Cao,et al.  Support vector machines experts for time series forecasting , 2003, Neurocomputing.

[31]  María S. Pérez,et al.  Grid Global Behavior Prediction , 2011, 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[32]  Haifeng Chen,et al.  Intelligent Workload Factoring for a Hybrid Cloud Computing Model , 2009, 2009 Congress on Services - I.

[33]  John W. Eaton,et al.  Gnu Octave Manual , 2002 .

[34]  Sunilkumar S. Manvi,et al.  Resource management for Infrastructure as a Service (IaaS) in cloud computing: A survey , 2014, J. Netw. Comput. Appl..

[35]  Igor Kononenko,et al.  Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[36]  Bo Cheng,et al.  A cost-aware auto-scaling approach using the workload prediction in service clouds , 2014, Inf. Syst. Frontiers.

[37]  Zhoujun Li,et al.  An Integrated Approach to Automatic Management of Virtualized Resources in Cloud Environments , 2011, Comput. J..

[38]  Hossein Deldari,et al.  Predicting Job Failures in AuverGrid Based on Workload Log Analysis , 2012, New Generation Computing.

[39]  Kevin Lee,et al.  Empirical prediction models for adaptive resource provisioning in the cloud , 2012, Future Gener. Comput. Syst..

[40]  Kai Hwang,et al.  Adaptive Workload Prediction of Grid Performance in Confidence Windows , 2010, IEEE Transactions on Parallel and Distributed Systems.

[41]  Ta-Hsin Li A Hierarchical Framework for Modeling and Forecasting Web Server Workload , 2005 .

[42]  Ruibin Zhang,et al.  Referential kNN Regression for Financial Time Series Forecasting , 2013, ICONIP.

[43]  S. Imandoust,et al.  Application of K-Nearest Neighbor (KNN) Approach for Predicting Economic Events: Theoretical Background , 2013 .

[44]  Marko Robnik-Sikonja,et al.  Improving Random Forests , 2004, ECML.

[45]  Archana Ganapathi,et al.  Statistics-driven workload modeling for the Cloud , 2010, 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010).

[46]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[47]  Xiaodong Wang,et al.  Hierarchical Forecasting of Web Server Workload Using Sequential Monte Carlo Training , 2006, 2006 40th Annual Conference on Information Sciences and Systems.

[48]  Prajakta S. Kalekar Time series Forecasting using Holt-Winters Exponential Smoothing , 2004 .

[49]  Xifeng Yan,et al.  Workload characterization and prediction in the cloud: A multiple time series approach , 2012, 2012 IEEE Network Operations and Management Symposium.

[50]  Stephen A. Jarvis,et al.  An investigation into the application of different performance prediction techniques to e-Commerce applications , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[51]  Pedro Sousa,et al.  Multi‐scale Internet traffic forecasting using neural networks and time series methods , 2010, Expert Syst. J. Knowl. Eng..

[52]  Naveen Sharma,et al.  Towards autonomic workload provisioning for enterprise Grids and clouds , 2009, 2009 10th IEEE/ACM International Conference on Grid Computing.

[53]  Xiaodong Li,et al.  Time series forecasting by evolving artificial neural networks with genetic algorithms, differential evolution and estimation of distribution algorithm , 2011, Neural Computing and Applications.

[54]  M. Ashraful Amin,et al.  Neural network and regression based processor load prediction for efficient scaling of Grid and Cloud resources , 2011, 14th International Conference on Computer and Information Technology (ICCIT 2011).

[55]  Martin Mozina,et al.  Orange: data mining toolbox in python , 2013, J. Mach. Learn. Res..