A genetic algorithm-based clustering approach for database partitioning

In a typical distributed/parallel database system, a request mostly accesses a subset of the entire database. It is, therefore, natural to organize commonly accessed data together and to place them on nearby, preferably the same, machine(s)/site(s). For this reason, data partitioning and data allocation are performance critical issues in distributed database application design. We are dealing with data partitioning. Data partitioning requires the use of clustering. Although many clustering algorithms have been proposed, their performance has not been extensively studied. Moreover, the special problem structure in clustering is rarely exploited. We explore the use of a genetic search-based clustering algorithm for data partitioning to achieve high database retrieval performance. By formulating the underlying problem as a traveling salesman problem (TSP), we can take advantage of this particular structure. Three new operators for GAs are also proposed and experimental results indicate that they outperform other operators in solving the TSP. The proposed GA is applied to solve the data-partitioning problem. Our computational study shows that our GA performs well for this application.

[1]  L. Hubert,et al.  Quadratic assignment as a general data analysis strategy. , 1976 .

[2]  Zbigniew Michalewicz,et al.  Genetic algorithms + data structures = evolution programs (3rd ed.) , 1996 .

[3]  Fred Glover,et al.  Tabu Search - Part II , 1989, INFORMS J. Comput..

[4]  Michael Hammer,et al.  A heuristic approach to attribute partitioning , 1979, SIGMOD '79.

[5]  David E. Goldberg,et al.  Alleles, loci and the traveling salesman problem , 1985 .

[6]  L. Darrell Whitley,et al.  The GENITOR Algorithm and Selection Pressure: Why Rank-Based Allocation of Reproductive Trials is Best , 1989, ICGA.

[7]  David Beasley,et al.  An overview of genetic algorithms: Part 1 , 1993 .

[8]  G. Syswerda,et al.  Schedule Optimization Using Genetic Algorithms , 1991 .

[9]  Lawrence Davis,et al.  Applying Adaptive Algorithms to Epistatic Domains , 1985, IJCAI.

[10]  Patrick Valduriez,et al.  Principles of Distributed Database Systems , 1990 .

[11]  Fred W. Glover,et al.  Tabu Search - Part I , 1989, INFORMS J. Comput..

[12]  Keinosuke Fukunaga,et al.  A Branch and Bound Clustering Algorithm , 1975, IEEE Transactions on Computers.

[13]  Maria E. Orlowska,et al.  On fragmentation approaches for distributed database design , 1994 .

[14]  Gerhard Reinelt,et al.  TSPLIB - A Traveling Salesman Problem Library , 1991, INFORMS J. Comput..

[15]  Larry E. Stanfel,et al.  Applications of clustering to information system design , 1983, Inf. Process. Manag..

[16]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[17]  Nils J. Nilsson,et al.  Artificial Intelligence , 1974, IFIP Congress.

[18]  Paul J. Schweitzer,et al.  Problem Decomposition and Data Reorganization by a Clustering Technique , 1972, Oper. Res..

[19]  Rob A. Rutenbar,et al.  Simulated annealing algorithms: an overview , 1989, IEEE Circuits and Devices Magazine.

[20]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[21]  D. E. Goldberg,et al.  Genetic Algorithms in Search , 1989 .

[22]  M. Abdelguerfi,et al.  Sorting and joining relations with duplicate attribute values , 1990, Proceedings. PARBASE-90: International Conference on Databases, Parallel Architectures, and Their Applications.

[23]  James R. Slagle,et al.  A Clustering and Data-Reorganizing Algorithm , 1975, IEEE Transactions on Systems, Man, and Cybernetics.

[24]  Edsger W. Dijkstra,et al.  A note on two problems in connexion with graphs , 1959, Numerische Mathematik.

[25]  Michelle Pal Parallel Database Techniques , 2001, Scalable Comput. Pract. Exp..

[26]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[27]  L. Darrell Whitley,et al.  Scheduling Problems and Traveling Salesmen: The Genetic Edge Recombination Operator , 1989, International Conference on Genetic Algorithms.

[28]  Shamkant B. Navathe,et al.  Vertical partitioning algorithms for database design , 1984, TODS.

[29]  Dennis G. Severance,et al.  Mathematical Techniques for Efficient Record Segmentation in Large Shared Databases , 1976, JACM.

[30]  Philip S. Yu,et al.  A vertical partitioning algorithm for relational databases , 1987, 1987 IEEE Third International Conference on Data Engineering.

[31]  Zbigniew Michalewicz,et al.  Genetic Algorithms + Data Structures = Evolution Programs , 1996, Springer Berlin Heidelberg.

[32]  D. J. Rosenkrantz,et al.  Approximate Algorithms for the Traveling Salesperson Problem , 1974, SWAT.

[33]  Brian W. Kernighan,et al.  An Effective Heuristic Algorithm for the Traveling-Salesman Problem , 1973, Oper. Res..

[34]  L. Darrell Whitley,et al.  A Comparison of Genetic Sequencing Operators , 1991, ICGA.

[35]  Shamkant B. Navathe,et al.  Vertical partitioning for database design: a graphical algorithm , 1989, SIGMOD '89.

[36]  J. Galletly An Overview of Genetic Algorithms , 1992 .

[37]  Jan Karel Lenstra,et al.  Some Simple Applications of the Travelling Salesman Problem , 1975 .

[38]  D. J. Smith,et al.  A Study of Permutation Crossover Operators on the Traveling Salesman Problem , 1987, ICGA.

[39]  Philip S. Yu,et al.  An Effective Approach to Vertical Partitioning for Physical Design of Relational Databases , 1990, IEEE Trans. Software Eng..

[40]  Jeffrey A. Hoffer An integer programming formulation of computer data base design problems , 1976, Inf. Sci..

[41]  Yin-Fu Huang,et al.  Vertical Partitioning in Database Design , 1995, Inf. Sci..

[42]  Larry E. Stanfel,et al.  Experiments with a very efficient heuristic for clustering problems , 1979, Inf. Syst..

[43]  Emanuel Falkenauer,et al.  Genetic Algorithms and Grouping Problems , 1998 .

[44]  P. Miliotis,et al.  Integer programming approaches to the travelling salesman problem , 1976, Math. Program..

[45]  E. L. Lawler,et al.  Branch-and-Bound Methods: A Survey , 1966, Oper. Res..