On K-means clustering-based approach for DDBSs design

In Distributed Database Systems (DDBS), communication costs and response time have long been open-ended challenges. Nevertheless, when DDBS is carefully designed, the desired reduction in communication costs will be achieved. Data fragmentation (data clustering) and data allocation are on popularity as the prime strategies in constant use to design DDBS. Based on these strategies, on the other hand, several design techniques have been presented in the literature to improve DDBS performance using either empirical results or data statistics, making most of them imperfect or invalid particularly, at least, at the initial stage of DDBSs design. In this paper, thus, a heuristic k-means approach for vertical fragmentation and allocation is introduced. This approach is primarily focused on DDBS design at the initial stage. Many techniques are being joined in a step to make a promising work. A brief yet effective experimental study, on both artificially-created and real datasets, has been conducted to demonstrate the optimality of the proposed approach, comparing with its counterparts, as the obtained results has been shown encouraging.

[1]  N. Sandhya,et al.  Analysis of Variant Approaches for Initial Centroid Selection in K-Means Clustering Algorithm , 2018 .

[2]  Ali A. Amer,et al.  A Comprehensive Taxonomy of Fragmentation and Allocation Techniques in Distributed Database Design , 2018, ACM Comput. Surv..

[3]  Halife Kodaz,et al.  A new approach based on particle swarm optimization algorithm for solving data allocation problem , 2018, Appl. Soft Comput..

[4]  Hisashi Koga,et al.  Fast agglomerative hierarchical clustering algorithm using Locality-Sensitive Hashing , 2007, Knowledge and Information Systems.

[5]  Tin Myint Naing,et al.  Non-Redundant Dynamic Fragment Allocation with Horizontal Partition in Distributed Database System , 2018, 2018 International Conference on Intelligent Informatics and Biomedical Sciences (ICIIBMS).

[6]  Tim Waage,et al.  A Replication Scheme for Multiple Fragmentations with Overlapping Fragments , 2016, Comput. J..

[7]  Masood Niazi Torshiz,et al.  Enhanced schemes for data fragmentation, allocation, and replication in distributed database systems , 2020, Comput. Syst. Sci. Eng..

[8]  Mohamed F. Tolba,et al.  Dynamic data reallocation and replication over a cloud environment , 2018, Concurr. Comput. Pract. Exp..

[9]  Marghny H. Mohamed,et al.  On an Effective Hierarchical Clustering Based Model for Data Fragmentation and Allocation in Relational DDBS: Review and Proposal , 2018, ICCES'18.

[10]  Hassan Ismail Abdalla,et al.  A Novel Query-Driven Clustering-Based Technique for Vertical Fragmentation and Allocation in Distributed Database Systems , 2017, Int. J. Semantic Web Inf. Syst..

[11]  Marghny H. Mohamed,et al.  ASGOP: An aggregated similarity-based greedy-oriented approach for relational DDBSs design , 2020, Heliyon.

[12]  Hassan Ismail Abdalla,et al.  Towards an Efficient Data Fragmentation, Allocation, and Clustering Approach in a Distributed Environment , 2019, Inf..

[13]  Ali A. Amer Data Replication Impact on DDBS System Performance , 2019 .

[14]  M. Kirchberg,et al.  A heuristic approach to vertical fragmentation incorporating query information , 2006, 2006 7th International Baltic Conference on Databases and Information Systems.

[15]  Shikha Mehta,et al.  Differential bond energy algorithm for optimal vertical fragmentation of distributed databases , 2018 .

[16]  S. K. Somov,et al.  Creation of Information-Technological Reserve in Distributed Data Processing Systems , 2019, Autom. Remote. Control..

[17]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[18]  Richard W. Hamming,et al.  Error detecting and error correcting codes , 1950 .

[19]  Alexey L. Lastovetsky,et al.  A Survey of Communication Performance Models for High-Performance Computing , 2019, ACM Comput. Surv..

[20]  Nasser Lotfi Data allocation in distributed database systems: a novel hybrid method based on differential evolution and variable neighborhood search , 2019 .

[21]  Adel A. Sewisy,et al.  An optimized approach for simultaneous horizontal data fragmentation and allocation in Distributed Database Systems (DDBSs) , 2017, Heliyon.

[22]  Adam Prügel-Bennett,et al.  Novel centroid selection approaches for KMeans-clustering based recommender systems , 2015, Inf. Sci..

[23]  A. A. Amer,et al.  A heuristic approach to re-allocate data fragments in DDBSs , 2012, 2012 International Conference on Information Technology and e-Services.