Load Prediction for Data Centers Based on Database Service

In the era of cloud computing, the over-occupancy of data center resources (CPU, memory, disk) and subsequent machine failure have resulted in great loss to users and enterprises. So it makes sense to anticipate the server workload in advance. Previous research on server workloads has focused on trend analysis and time series fitting. We propose an approach to forecast the workloads of servers based on machine learning. And our data comes from a database-based data center that is real, large-scale, and enterprise-class. We use the servers' historical monitoring data for our models to predict future workloads and hence provide the ability to automatically warn overload and reallocate resources. We calculate the failure detection rate and false alarm rate of our overload detection models, as well as put forward an evaluation based on the overload processing cost. Experimental results show that machine learning methods especially Random Forest can better predict the server load than traditional time series analysis method. We use the forecast results to propose some scheduling strategies to prevent server overload, achieve intelligent operation and maintenance, and failure prediction. Compared with the traditional time series analysis method, our method uses less data and lower dimensions, and yields more accurate predictions.

[1]  Mohammed Samaka,et al.  Machine Learning for Anomaly Detection and Categorization in Multi-Cloud Environments , 2017, 2017 IEEE 4th International Conference on Cyber Security and Cloud Computing (CSCloud).

[2]  John B. Shoven,et al.  I , Edinburgh Medical and Surgical Journal.

[3]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machine Classifiers , 1999, Neural Processing Letters.

[4]  Lyudmila Sukhostat,et al.  Anomaly detection in network traffic using extreme learning machine , 2016, 2016 IEEE 10th International Conference on Application of Information and Communication Technologies (AICT).

[5]  Ricardo Bianchini,et al.  Resource Central: Understanding and Predicting Workloads for Improved Resource Management in Large Cloud Platforms , 2017, SOSP.

[6]  J. Chase,et al.  Data Center Workload Monitoring , Analysis , and Emulation , 2005 .

[7]  Bo Deng,et al.  Workload prediction for cloud computing elasticity mechanism , 2016, 2016 IEEE International Conference on Cloud Computing and Big Data Analysis (ICCCBDA).

[8]  Rajkumar Buyya,et al.  Energy-Efficient Management of Data Center Resources for Cloud Computing: A Vision, Architectural Elements, and Open Challenges , 2010, PDPTA.

[9]  Karl Aberer,et al.  Robust Online Time Series Prediction with Recurrent Neural Networks , 2016, 2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA).

[10]  Nong Ye,et al.  Naïve Bayes Classifier , 2013 .

[11]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[14]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[15]  J. Hilbe Logistic Regression Models , 2009 .

[16]  David A. Landgrebe,et al.  A survey of decision tree classifier methodology , 1991, IEEE Trans. Syst. Man Cybern..

[17]  S. Bacha,et al.  Hourly server workload forecasting up to 168 hours ahead using Seasonal ARIMA model , 2012, 2012 IEEE International Conference on Industrial Technology.

[18]  Arif Merchant,et al.  Projecting disk usage based on historical trends in a cloud environment , 2012, ScienceCloud '12.

[19]  Yao Lu,et al.  RVLBPNN: A Workload Forecasting Model for Smart Cloud Computing , 2016, Sci. Program..

[20]  Evgenia Smirni,et al.  PRACTISE: Robust prediction of data center time series , 2015, 2015 11th International Conference on Network and Service Management (CNSM).

[21]  P. Young,et al.  Time series analysis, forecasting and control , 1972, IEEE Transactions on Automatic Control.

[22]  Joseph E. Beck,et al.  Naive Bayes Classifiers for User Modeling , 1999 .

[23]  Yisong Yue,et al.  Telemetry Anomaly Detection System Using Machine Learning to Streamline Mission Operations , 2017, 2017 6th International Conference on Space Mission Challenges for Information Technology (SMC-IT).

[24]  Rajkumar Buyya,et al.  Workload Prediction Using ARIMA Model and Its Impact on Cloud Applications’ QoS , 2015, IEEE Transactions on Cloud Computing.

[25]  Arif Merchant,et al.  Storage provisioning and allocation in a large cloud environment , 2012 .

[26]  藤井 光昭,et al.  Autoregressive Integrated Moving Average (p,d,q)モデルについて (時系列解析の推測 : 理論と応用) , 1981 .

[27]  Barbara Pernici,et al.  Managing the complex data center environment: an Integrated Energy-aware Framework , 2014, Computing.

[28]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[29]  Jian Weng,et al.  Machine Learning-Based Malicious Application Detection of Android , 2017, IEEE Access.

[30]  Srinivas Katkoori,et al.  LSTM-Based Memory Profiling for Predicting Data Attacks in Distributed Big Data Systems , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[31]  Yusuf Yaslan,et al.  A hybrid method for time series prediction using EMD and SVR , 2014, 2014 6th International Symposium on Communications, Control and Signal Processing (ISCCSP).

[32]  Kristin L. Sainani,et al.  Logistic Regression , 2014, PM & R : the journal of injury, function, and rehabilitation.

[33]  Kresimir Mihic,et al.  A system for online power prediction in virtualized environments using gaussian mixture models , 2010, Design Automation Conference.

[34]  Murat Can Ganiz,et al.  A machine learning approach to database failure prediction , 2017, 2017 International Conference on Computer Science and Engineering (UBMK).

[35]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[36]  Jean-Marc Menaud,et al.  Cloud Workload Prediction and Generation Models , 2017, 2017 29th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD).