A Multi-copy Join Optimization of Information Integration Systems Based on a Genetic Algorithm

In view of inevitable redundancies in local data sources in heterogeneous information integration systems, a multi-copy join optimization method (MuCoJo for short) based on a genetic algorithm is proposed. MuCoJo can choose appropriate redundant copies of the tables to participate a joint query and optimizes the join order of it. By using the redundant copies, MuCoJo enlarges the search space so that the concurrent executions of different local sources can be best used. Meanwhile, MuCoJo could take advantage of redundancies features in such systems and get faster joint query response time. Experimental results show the computational efficiency of the MuCoJo and its necessity in information integration system. Moreover, during the population initialization, controlling invalid solutions can reduce the search space effectively at the cost of initialization time consuming.

[1]  Alon Y. Levy Combining artificial intelligence and databases for data integration , 1999 .

[2]  Ren Mei-rui The database query optimization strategies based on genetic algorithms , 2004 .

[3]  Weimin Du,et al.  Query Optimization in a Heterogeneous DBMS , 1992, VLDB.

[4]  Cao Yang Parallel Query Optimization Techniques for Multi-Join Expressions Based onGenetic Algorithm , 2002 .

[5]  Meng Xiaofeng,et al.  State of the Art and Trends in Database Research , 2004 .

[6]  Susan Darling Urban,et al.  An Object Algebra Approach to Multidatabase Query Decomposition in Donají , 2002, Distributed and Parallel Databases.

[7]  Murat Ali Bayir,et al.  Genetic Algorithm for the Multiple-Query Optimization Problem , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[8]  M. Howard Williams,et al.  A Model for Query Decomposition and Answer Construction in Heterogeneous Distributed Database Systems , 1998, Journal of Intelligent Information Systems.

[9]  Ulf Leser,et al.  Federated Information Systems: Concepts, Terminology and Architectures , 2007 .

[10]  Michael C. Ferris,et al.  A Genetic Algorithm for Database Query Optimization , 1991, ICGA.

[11]  Xie Jun,et al.  Modeling and Optimization Strategy for Heterogeneous Catalysis Based on Support Vector Regression and Genetic Algorithm , 2009, 2009 Second International Conference on Intelligent Computation Technology and Automation.

[12]  Jennifer Widom,et al.  The Lowell database research self-assessment , 2003, CACM.

[13]  Ulf Leser Information Integration , 2008, Encyclopedia of GIS.

[14]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .