Autonomic performance prediction framework for data warehouse queries using lazy learning approach

Abstract Information is one of the most important assets of an organization. In recent years, the volume of data stored in organizations, varying user requirements, time constraints, and query management complexities have grown exponentially. Due to these problems, the performance modeling of queries in data warehouses (DWs) has assumed a key role in organizations. DWs make relevant information available to decision-makers; however, DW administration is becoming increasingly difficult and time-consuming. DW administrators spend too much time managing queries, which also affects data warehouse performance. To enhance the performance of overloaded data warehouses with varying queries, a prediction-based framework is required that forecasts the behavior of query performance metrics in a DW. In this study, we propose a cluster-based autonomic performance prediction framework using a case-based reasoning approach that determines the performance metrics of the data warehouse in advance by incorporating autonomic computing characteristics. This prediction is helpful for query monitoring and management. For evaluation, we used metrics for precision, recall, accuracy, and relative error rate. The proposed approach is also compared with existing lazy learning techniques. We used the standard TPC-H dataset. Experiments show that our proposed approach produce better results compared to existing techniques.

[1]  Zhiping Zhou,et al.  Dynamic Monte Carlo simulations of effects of nanoparticle on polymer crystallization in polymer solutions , 2018 .

[2]  Alfons Kemper,et al.  HyPer: A hybrid OLTP&OLAP main memory database system based on virtual memory snapshots , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[3]  Basit Raza,et al.  Application of Data Warehouse in Real Life: State-of-the-art Survey from User Preferences’ Perspective , 2016 .

[4]  Nusrat Shaheen,et al.  A CBR Model for Workload Characterization in Autonomic Database Management System , 2018, 2018 14th International Conference on Emerging Technologies (ICET).

[5]  Salim Hariri,et al.  Autonomic Computing: An Overview , 2004, UPP.

[6]  Germain Forestier,et al.  Deep learning for time series classification: a review , 2018, Data Mining and Knowledge Discovery.

[7]  B D Satoto,et al.  Integration K-Means Clustering Method and Elbow Method For Identification of The Best Customer Profile Cluster , 2018, IOP Conference Series: Materials Science and Engineering.

[8]  Alsayed Algergawy,et al.  MAG: A performance evaluation framework for database systems , 2015, Knowl. Based Syst..

[9]  Patrick Martin,et al.  Workload Management in Database Management Systems: A Taxonomy , 2018, IEEE Transactions on Knowledge and Data Engineering.

[10]  Yogan Jaya Kumar,et al.  Autonomic workload performance tuning in large-scale data repositories , 2018, Knowledge and Information Systems.

[11]  Artur Wojciechowski,et al.  ETL workflow reparation by means of case-based reasoning , 2018, Inf. Syst. Frontiers.

[12]  Mukesh Prasad,et al.  A Novel Quantum-inspired Fuzzy Based Neural Network for Data Classification , 2019 .

[13]  Ye Zhou,et al.  Performance prediction for performance-sensitive queries based on algorithmic complexity , 2013 .

[14]  Jeffrey F. Naughton,et al.  Predicting query execution time: Are optimizer cost models really unusable? , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[15]  Olga Pons,et al.  Indexing techniques to improve the performance of necessity-based fuzzy queries using classical indexing of RDBMS , 2018, Fuzzy Sets Syst..

[16]  Mario Miličević,et al.  Application of Machine Learning Algorithms for the Query Performance Prediction , 2015 .

[17]  Hai Le Vu,et al.  Feature extraction and clustering analysis of highway congestion , 2019, Transportation Research Part C: Emerging Technologies.

[18]  Henri Briand,et al.  An Ontology-Based Autonomic System for Improving Data Warehouse Performances , 2009, KES.

[19]  Muhammad Faheem,et al.  Performance prediction and adaptation for database management system workload using Case-Based Reasoning approach , 2018, Inf. Syst..

[20]  Ciza Thomas,et al.  Performance evaluation of classifiers for spam detection with benchmark datasets , 2016, 2016 International Conference on Data Mining and Advanced Computing (SAPIENCE).

[21]  Manoj K. Nambiar,et al.  Predicting SQL Query Execution Time for Large Data Volume , 2016, IDEAS.

[22]  Grigorios Tsoumakas,et al.  A survey of machine learning techniques for food sales prediction , 2018, Artificial Intelligence Review.