Performance prediction and adaptation for database management system workload using Case-Based Reasoning approach

Abstract Workload management in a Database Management System (DBMS) has become difficult and challenging because of workload complexity and heterogeneity. During and after execution of the workload, it is hard to control and handle the workload. Before executing the workload, predicting its performance can help us in workload management. By knowing the type of workload in advance, we can predict its performance in an adaptive way that will enable us to monitor and control the workload, which ultimately leads to performance tuning of the DBMS. This study proposes a predictive and adaptive framework named as the Autonomic Workload Performance Prediction (AWPP) framework. The proposed AWPP framework predicts and adapts the DBMS workload performance on the basis of information available in advance before executing the workload. The Case-Based Reasoning (CBR) approach is used to solve the workload management problem. The proposed CBR approach is compared with other machine learning techniques. To validate the AWPP framework, a number of benchmark workloads of the Decision Support System (DSS) and the Online Transaction Processing (OLTP) are executed on the MySQL DBMS. For preparation of training and testing data, we executed more than 1000 TPC-H and TPC-C like workloads on a standard data set. The results show that our proposed AWPP framework through CBR modeling performs better in predicting and adapting the DBMS workload. DBMSs algorithms can be optimized for this prediction and workload can be controlled and managed in a better way. In the end, the results are validated by performing post-hoc tests.

[1]  Claudia Rosas,et al.  Improving Performance on Data-Intensive Applications Using a Load Balancing Methodology Based on Divisible Load Theory , 2012, International Journal of Parallel Programming.

[2]  Agnar Aamodt,et al.  Case-Based Reasoning: Foundational Issues, Methodological Variations, and System Approaches , 1994, AI Commun..

[3]  Shuguang Wang,et al.  Research on Workload Adaptation Architecture for DBMS , 2010, 2010 International Symposium on Intelligence Information Processing and Trusted Computing.

[4]  Ion Stoica,et al.  Ernest: Efficient Performance Prediction for Large-Scale Advanced Analytics , 2016, NSDI.

[5]  Jeffrey O. Kephart,et al.  An architectural approach to autonomic computing , 2004 .

[6]  S. Sathiya Keerthi,et al.  Which Is the Best Multiclass SVM Method? An Empirical Study , 2005, Multiple Classifier Systems.

[7]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[8]  Gabriel Tamura,et al.  Development and Instrumentation of a Framework for the Generation and Management of Self-Adaptive Enterprise Applications , 2016 .

[9]  Umakant P. Kulkarni,et al.  Adaptive self-tuning techniques for performance tuning of database systems: a fuzzy-based approach with tuning moderation , 2015, Soft Comput..

[10]  Muhammad Sher,et al.  Workload management: a technology perspective with respect to self-* characteristics , 2014, Artificial Intelligence Review.

[11]  Alsayed Algergawy,et al.  MAG: A performance evaluation framework for database systems , 2015, Knowl. Based Syst..

[12]  Christopher K. Riesbeck,et al.  Inside Case-Based Reasoning , 1989 .

[13]  Ewa Dudek-Dyduch,et al.  Effectiveness of artificial neural networks adaptation according to time period of training data acquisition , 2005, 5th International Conference on Intelligent Systems Design and Applications (ISDA'05).

[14]  Carlo Curino,et al.  Performance and resource modeling in highly-concurrent OLTP workloads , 2013, SIGMOD '13.

[15]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[16]  Rakebul Hasan,et al.  Predicting SPARQL Query Performance and Explaining Linked Data , 2014, ESWC.

[17]  Jeffrey F. Naughton,et al.  Towards Predicting Query Execution Time for Concurrent and Dynamic Database Workloads , 2013, Proc. VLDB Endow..

[18]  M. Friedman A Comparison of Alternative Tests of Significance for the Problem of $m$ Rankings , 1940 .

[19]  Ye Zhou,et al.  Performance prediction for performance-sensitive queries based on algorithmic complexity , 2013 .

[20]  Eli Upfal,et al.  Learning-based Query Performance Modeling and Prediction , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[21]  Surajit Chaudhuri,et al.  Database Tuning Advisor for Microsoft SQL Server 2005 , 2004, VLDB.

[22]  Archana Ganapathi,et al.  Predicting Multiple Metrics for Queries: Better Decisions Enabled by Machine Learning , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[23]  Ronald C. Dodge,et al.  Preserving QoS of e-commerce sites through self-tuning: a performance model approach , 2001, EC '01.

[24]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[25]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[26]  Francisco Herrera,et al.  Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power , 2010, Inf. Sci..

[27]  Samuel Kounev,et al.  Self‐adaptive workload classification and forecasting for proactive resource provisioning , 2014, Concurr. Comput. Pract. Exp..

[28]  Patrick Martin,et al.  Workload adaptation in autonomic DBMSs , 2006, CASCON.

[29]  Alexander E. Gegov,et al.  Dynamic Resource Allocation Through Workload Prediction for Energy Efficient Computing , 2016, UKCI.

[30]  Ming Zhu,et al.  Performance Prediction for Concurrent Workloads in Distributed Database Systems , 2015, ICA3PP.

[31]  Seyong Lee,et al.  COMPASS: A Framework for Automated Performance Modeling and Prediction , 2015, ICS.

[32]  Daniel A. Menascé,et al.  On the Use of Performance Models to Design Self-Managing Computer Systems , 2003, Int. CMG Conference.

[33]  Sanghamitra Bandyopadhyay,et al.  Unsupervised Classification: Similarity Measures, Classical and Metaheuristic Approaches, and Applications , 2012 .

[34]  Said Elnaffar,et al.  Automatically classifying database workloads , 2002, CIKM '02.

[35]  Wolfgang Banzhaf,et al.  The use of computational intelligence in intrusion detection systems: A review , 2010, Appl. Soft Comput..

[36]  Sam Lightstone,et al.  Toward autonomic computing with DB2 universal database , 2002, SGMD.

[37]  Rhys De War,et al.  WEKA machine learning project: cow culling , 1994 .

[38]  S. García,et al.  An Extension on "Statistical Comparisons of Classifiers over Multiple Data Sets" for all Pairwise Comparisons , 2008 .

[39]  M. Friedman The Use of Ranks to Avoid the Assumption of Normality Implicit in the Analysis of Variance , 1937 .

[40]  Patrick Martin,et al.  Towards Autonomic Workload Management in DBMSs , 2009, J. Database Manag..

[41]  Rajiv Ranjan,et al.  Resource and Performance Distribution Prediction for Large Scale Analytics Queries , 2016, ICPE.

[42]  Ada Diaconescu,et al.  Autonomic Computing Architectures , 2013 .

[43]  Manoj K. Nambiar,et al.  Predicting SQL Query Execution Time for Large Data Volume , 2016, IDEAS.

[44]  Jeffrey F. Naughton,et al.  Predicting query execution time: Are optimizer cost models really unusable? , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[45]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[46]  Chetan Gupta,et al.  PQR: Predicting Query Execution Times for Autonomous Workload Management , 2008, 2008 International Conference on Autonomic Computing.