Integrating frequent pattern clustering and branch-and-bound approaches for data partitioning

In this paper, we propose an approach integrating frequent pattern clustering and branch-and-bound algorithms for finding an optimal database partition. First, the Apriori algorithm and cosine similarity are used to determine weighted frequent patterns according to a transaction profile. On the basis of the weighted frequent patterns, we developed two methods for partitioning a database: the candidate method and the optimal method. The optimal method involves using a branch-and-bound algorithm and considering costs in each step of combining attributes until an optimal solution is reached. Furthermore, we refined the optimal method for expediting the execution by reducing the search space. Finally, the experimental results show that the proposed optimal method performs the highest among all examined methods, and the refined method is considerably more efficient than the original method.

[1]  Alberto Abelló,et al.  Tuning small analytics on Big Data: Data partitioning and secondary indexes in the Hadoop ecosystem , 2015, Inf. Syst..

[2]  H. I. Abdalla,et al.  An enhanced grouping algorithm for vertical partitioning problem in DDBs , 2007, 2007 22nd international symposium on computer and information sciences.

[3]  Narasimhaiah Gorla,et al.  Applying genetic algorithms in database partitioning , 2003, SAC '03.

[4]  Sabeur Aridhi,et al.  Density-based data partitioning strategy to approximate large-scale subgraph mining , 2012, Inf. Syst..

[5]  Surajit Chaudhuri,et al.  Database tuning advisor for microsoft SQL server 2005: demo , 2005, SIGMOD '05.

[6]  Myoung Ho Kim,et al.  An adaptable vertical partitioning method in distributed systems , 2004 .

[7]  Narasimhaiah Gorla,et al.  Vertical Fragmentation in Databases Using Data-Mining Technique , 2008, Int. J. Data Warehous. Min..

[8]  Shamkant B. Navathe,et al.  Vertical partitioning for database design: a graphical algorithm , 1989, SIGMOD '89.

[9]  Arturo González-Escribano,et al.  Blending Extensibility and Performance in Dense and Sparse Parallel Data Management , 2014, IEEE Transactions on Parallel and Distributed Systems.

[10]  Xiaoou Li,et al.  A support-based vertical partitioning method for database design , 2011, 2011 8th International Conference on Electrical Engineering, Computing Science and Automatic Control.

[11]  Kenneth A. Ross,et al.  Energy Analysis of Hardware and Software Range Partitioning , 2014, TOCS.

[12]  Xiaoou Li,et al.  A Vertical Partitioning Algorithm for Distributed Multimedia Databases , 2011, DEXA.

[13]  Jeffrey A. Hoffer An integer programming formulation of computer data base design problems , 1976, Inf. Sci..

[14]  Yin-Fu Huang,et al.  Vertical Partitioning in Database Design , 1995, Inf. Sci..

[15]  Vivek R. Narasayya,et al.  Integrating vertical and horizontal partitioning into automated physical database design , 2004, SIGMOD '04.

[16]  Jun Du,et al.  Attraction - A Global Affinity Measure for Database Vertical Partitioning , 2003, ICWI.

[17]  Narasimhaiah Gorla,et al.  A Genetic Algorithm for Vertical Fragmentation and Access Path Selection , 2000, Comput. J..

[18]  Michael Hammer,et al.  A heuristic approach to attribute partitioning , 1979, SIGMOD '79.

[19]  Philip S. Yu,et al.  An Effective Approach to Vertical Partitioning for Physical Design of Relational Databases , 1990, IEEE Trans. Software Eng..

[20]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[21]  Chun Zhang,et al.  Automating physical database design in a parallel database , 2002, SIGMOD '02.

[22]  Saudi Arabia,et al.  An Optimized Scheme for Vertical Partitioning of a Distributed Database , 2008 .

[23]  Paul J. Schweitzer,et al.  Problem Decomposition and Data Reorganization by a Clustering Technique , 1972, Oper. Res..

[24]  Carlo Curino,et al.  Schism , 2010, Proc. VLDB Endow..

[25]  Kam-Fai Wong,et al.  A genetic algorithm-based clustering approach for database partitioning , 2002, IEEE Trans. Syst. Man Cybern. Part C.

[26]  Patricia G. Selinger,et al.  Access path selection in a relational database management system , 1979, SIGMOD '79.

[27]  Shamkant B. Navathe,et al.  Vertical partitioning algorithms for database design , 1984, TODS.

[28]  Carlo Curino,et al.  Skew-aware automatic database partitioning in shared-nothing, parallel OLTP systems , 2012, SIGMOD Conference.

[29]  Samir Khuller,et al.  SWORD: workload-aware data placement and replica selection for cloud data management systems , 2014, The VLDB Journal.

[30]  Ladjel Bellatreche,et al.  Query Interaction Based Approach for Horizontal Data Partitioning , 2015, Int. J. Data Warehous. Min..