A mixture of HMM, GA, and Elman network for load prediction in cloud-oriented data centers

The rapid growth of computational power demand from scientific, business, and Web applications has led to the emergence of cloud-oriented data centers. These centers use pay-as-you-go execution environments that scale transparently to the user. Load prediction is a significant cost-optimal resource allocation and energy saving approach for a cloud computing environment. Traditional linear or nonlinear prediction models that forecast future load directly from historical information appear less effective. Load classification before prediction is necessary to improve prediction accuracy. In this paper, a novel approach is proposed to forecast the future load for cloud-oriented data centers. First, a hidden Markov model (HMM) based data clustering method is adopted to classify the cloud load. The Bayesian information criterion and Akaike information criterion are employed to automatically determine the optimal HMM model size and cluster numbers. Trained HMMs are then used to identify the most appropriate cluster that possesses the maximum likelihood for current load. With the data from this cluster, a genetic algorithm optimized Elman network is used to forecast future load. Experimental results show that our algorithm outperforms other approaches reported in previous works.

[1]  Guoqiang Peter Zhang,et al.  Time series forecasting using a hybrid ARIMA and neural network model , 2003, Neurocomputing.

[2]  Xifeng Yan,et al.  Workload characterization and prediction in the cloud: A multiple time series approach , 2012, 2012 IEEE Network Operations and Management Symposium.

[3]  Daniel A. Menascé,et al.  Resource Allocation for Autonomic Data Centers using Analytic Performance Models , 2005, Second International Conference on Autonomic Computing (ICAC'05).

[4]  Gurmeet Singh Manku,et al.  Approximate counts and quantiles over sliding windows , 2004, PODS.

[5]  Rajkumar Buyya,et al.  Virtual Machine Provisioning Based on Analytical Performance and QoS in Cloud Computing Environments , 2011, 2011 International Conference on Parallel Processing.

[6]  Xiaowei Yang,et al.  CloudCmp: comparing public cloud providers , 2010, IMC '10.

[7]  Haixun Wang,et al.  Finding semantics in time series , 2011, SIGMOD '11.

[8]  Xiaodong Wang,et al.  Bayesian Basecalling for DNA Sequence Analysis using Hidden Markov Models , 2006, 2006 40th Annual Conference on Information Sciences and Systems.

[9]  Akira Hirose,et al.  Complex-Valued Neural Networks , 2006, Studies in Computational Intelligence.

[10]  Randy H. Katz,et al.  Heterogeneity and dynamicity of clouds at scale: Google trace analysis , 2012, SoCC '12.

[11]  Eddy Caron,et al.  Forecasting for Grid and Cloud Computing On-Demand Resources Based on Pattern Matching , 2010, 2010 IEEE Second International Conference on Cloud Computing Technology and Science.

[12]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[13]  Charles Reiss,et al.  Towards understanding heterogeneous clouds at scale : Google trace analysis , 2012 .

[14]  Barbara Panicucci,et al.  Multi-timescale Distributed Capacity Allocation and Load Redirect Algorithms for Cloud System , 2011 .

[15]  Eric Bauer,et al.  Reliability and Availability of Cloud Computing , 2012 .

[16]  A robust kernelized intuitionistic fuzzy c-means clustering algorithm in segmentation of noisy medical images , 2013, Pattern Recognit. Lett..

[17]  Randy H. Katz,et al.  Above the Clouds: A Berkeley View of Cloud Computing , 2009 .

[18]  Kin Keung Lai,et al.  Credit risk assessment with a multistage neural network ensemble learning approach , 2008, Expert Syst. Appl..

[19]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[20]  Junichi Yamagishi,et al.  Synthesis and evaluation of conversational characteristics in HMM-based speech synthesis , 2012, Speech Commun..

[21]  D. Weakliem A Critique of the Bayesian Information Criterion for Model Selection , 1999 .

[22]  Sheng Di,et al.  Characterization and Comparison of Cloud versus Grid Workloads , 2012, 2012 IEEE International Conference on Cluster Computing.

[23]  Kranthimanoj Nagothu,et al.  Prediction of cloud data center networks loads using stochastic and neural models , 2011, 2011 6th International Conference on System of Systems Engineering.

[24]  Yasushi Inoguchi,et al.  Improving accuracy of host load predictions on computational grids by artificial neural networks , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[25]  Chris Rose,et al.  A Break in the Clouds: Towards a Cloud Definition , 2011 .

[26]  Jeff A. Bilmes,et al.  A gentle tutorial of the em algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models , 1998 .

[27]  Chita R. Das,et al.  Towards characterizing cloud backend workloads: insights from Google compute clusters , 2010, PERV.

[28]  Baikunth Nath,et al.  A fusion model of HMM, ANN and GA for stock market forecasting , 2007, Expert Syst. Appl..

[29]  H. Bozdogan Model selection and Akaike's Information Criterion (AIC): The general theory and its analytical extensions , 1987 .

[30]  Prasad Saripalli,et al.  Load Prediction and Hot Spot Detection Models for Autonomic Cloud Computing , 2011, 2011 Fourth IEEE International Conference on Utility and Cloud Computing.

[31]  David A. Maltz,et al.  Network traffic characteristics of data centers in the wild , 2010, IMC '10.

[32]  Joarder Kamruzzaman,et al.  A hybrid of multiobjective Evolutionary Algorithm and HMM-Fuzzy model for time series prediction , 2012, Neurocomputing.

[33]  Wei Sun,et al.  CPU Load Predictions on the Computational Grid * , 2006, Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06).

[34]  David H. Bailey Reliability and availability , 2003 .

[35]  Dong-xiao Niu,et al.  Mid-long Term Load Forecasting Using Hidden Markov Model , 2009, 2009 Third International Symposium on Intelligent Information Technology Application.

[36]  Mohammad Atique,et al.  A Novel Adaptive Neuro Fuzzy Inference System Based CPU Scheduler for Multimedia Operating System , 2007, 2007 International Joint Conference on Neural Networks.

[37]  Sheng Di,et al.  Host load prediction in a Google compute cloud with a Bayesian model , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[38]  Yingjian Zhang,et al.  PREDICTION OF FINANCIAL TIME SERIES WITH HIDDEN MARKOV MODELS , 2004 .

[39]  David R. Anderson,et al.  Multimodel Inference , 2004 .

[40]  Mehdi Khashei,et al.  A novel hybrid classification model of artificial neural networks and multiple linear regression models , 2012, Expert Syst. Appl..

[41]  Dobrivoje Popovic,et al.  Computational Intelligence in Time Series Forecasting: Theory and Engineering Applications (Advances in Industrial Control) , 2005 .

[42]  Kevin Lee,et al.  Empirical prediction models for adaptive resource provisioning in the cloud , 2012, Future Gener. Comput. Syst..

[43]  Subhajyoti Bandyopadhyay,et al.  Cloud computing - The business perspective , 2011, Decis. Support Syst..

[44]  Eric Bauer,et al.  Reliability and Availability of Cloud Computing: Bauer/Cloud Computing , 2012 .

[45]  Yasushi Inoguchi,et al.  Performance evaluation of a Green Scheduling Algorithm for energy savings in Cloud computing , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).

[46]  Anjana Gosain,et al.  RETRACTED: A robust kernelized intuitionistic fuzzy c-means clustering algorithm in segmentation of noisy medical images , 2013 .