An automatic clustering technique for query plan recommendation

Abstract The query optimizer is responsible for identifying the most efficient Query Execution Plans (QEP’s). The distributed database relations may be kept in several places. These results in a dramatic increase in the number of alternative query’ plans. The query optimizer cannot exhaustively explore the alternative query plans in a vast search space at reasonable computational costs. Henceforth, reusing the previously generated plans instead of generating new plans for new queries is an efficient technique for query processing. To improve the accuracy of clustering, we’ve rewritten the queries to standardize their structures. Furthermore, TF representation schema has been used to convert the queries into vectors. In this paper, we’ve introduced a multi-objective automatic query plan recommendation method, a combination of incremental DBSCAN and NSGA-II. The quality of the results of incremental DBSCAN has been influenced by Minpts (minimum points) and Eps (epsilon). Two cluster validity indices, Dunn index and Davies–Bouldin index, have simultaneously been optimized to calculate the goodness of an answer. Comparative results have been shown against the incremental DBSCAN and K-means regarding an external cluster validity index, namely, the ARI. By comparing different types of query workloads, we’ve found that the introduced method outperforms the other well-known approaches.

[1]  Álvaro López García,et al.  Standards for enabling heterogeneous IaaS cloud federations , 2016, Comput. Stand. Interfaces.

[2]  J. C. Dunn,et al.  A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters , 1973 .

[3]  Kalyanmoy Deb,et al.  Simulated Binary Crossover for Continuous Search Space , 1995, Complex Syst..

[4]  Sanghamitra Bandyopadhyay,et al.  A New Principal Axis Based Line Symmetry Measurement and Its Application to Clustering , 2008, ICONIP.

[5]  Chun Guan,et al.  Particle swarm Optimized Density-based Clustering and Classification: Supervised and unsupervised learning approaches , 2019, Swarm Evol. Comput..

[6]  Welch Bl THE GENERALIZATION OF ‘STUDENT'S’ PROBLEM WHEN SEVERAL DIFFERENT POPULATION VARLANCES ARE INVOLVED , 1947 .

[7]  Rafael D. C. Santos,et al.  Text Mining Applied to SQL Queries: A Case Study for the SDSS SkyServer , 2015, SIMBig.

[8]  Ujjwal Maulik,et al.  Genetic clustering for automatic evolution of clusters and application to image classification , 2002, Pattern Recognit..

[9]  Sebastian Michel,et al.  Algebraic query optimization for distributed top-k queries , 2007, Informatik - Forschung und Entwicklung.

[10]  Nima Jafari Navimipour,et al.  Comprehensive and systematic review of the service composition mechanisms in the cloud environments , 2017, J. Netw. Comput. Appl..

[11]  Nima Jafari Navimipour,et al.  A new agent-based method for QoS-aware cloud service composition using particle swarm optimization algorithm , 2019, J. Ambient Intell. Humaniz. Comput..

[12]  Matteo Golfarelli,et al.  Similarity measures for OLAP sessions , 2013, Knowledge and Information Systems.

[13]  Nima Jafari Navimipour,et al.  A Systematic Literature Review of the Data Replication Techniques in the Cloud Environments , 2017, Big Data Res..

[14]  Joshua D. Knowles,et al.  An Evolutionary Approach to Multiobjective Clustering , 2007, IEEE Transactions on Evolutionary Computation.

[15]  Sanghamitra Bandyopadhyay,et al.  A symmetry based multiobjective clustering technique for automatic evolution of clusters , 2010, Pattern Recognit..

[16]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[18]  Duc Thanh Anh Luong,et al.  Similarity Metrics for SQL Query Clustering , 2018, IEEE Transactions on Knowledge and Data Engineering.

[19]  Nima Jafari Navimipour,et al.  Query optimization mechanisms in the cloud environments: A systematic study , 2019, Int. J. Commun. Syst..

[20]  Waleed Al Shehri Cloud Database Database as a Service , 2013 .

[21]  Amir Hossein Alavi,et al.  Behavior of crossover operators in NSGA-III for large-scale optimization problems , 2020, Inf. Sci..

[22]  Abderrahim El Qadi,et al.  A Recommendation System for Execution Plans Using Machine Learning , 2016 .

[23]  Sharifullah Khan,et al.  Security Aspects of Database-as-a-Service (DBaaS) in Cloud Computing , 2014 .

[24]  Severino F. Galán,et al.  Comparative evaluation of region query strategies for DBSCAN clustering , 2019, Inf. Sci..

[25]  B. L. Welch The generalisation of student's problems when several different population variances are involved. , 1947, Biometrika.

[26]  Ujjwal Maulik,et al.  Multiobjective Genetic Clustering for Pixel Classification in Remote Sensing Imagery , 2007, IEEE Transactions on Geoscience and Remote Sensing.

[27]  L. Hubert,et al.  Comparing partitions , 1985 .

[28]  Pushpak Bhattacharyya,et al.  Automatic Scientific Document Clustering Using Self-organized Multi-objective Differential Evolution , 2018, Cognitive Computation.

[29]  Jiang Xie,et al.  A new internal index based on density core for clustering validation , 2020, Inf. Sci..

[30]  Sriparna Saha,et al.  Fusion of expression values and protein interaction information using multi-objective optimization for improving gene clustering , 2017, Comput. Biol. Medicine.

[31]  John A. Hartigan,et al.  Clustering Algorithms , 1975 .

[32]  Jérôme Darmont,et al.  Clustering-Based Materialized View Selection in Data Warehouses , 2006, ADBIS.

[33]  A. Mukhopadhyay,et al.  Clustering Ensemble: A Multiobjective Genetic Algorithm based Approach , 2013 .

[34]  Olatz Arbelaitz,et al.  An extensive comparative study of cluster validity indices , 2013, Pattern Recognit..