GATuner: Tuning Schema Matching Systems Using Genetic Algorithms

Most recent schema matching systems combine multiple components, each of which employs a particular matching technique with several knobs. The multi-component nature has brought tuning problems for domain users. In this paper, we present GATuner, an approach to automatically tune schema matching systems using genetic algorithms. We match a given schema S against generated scenarios, for which the ground truth matches are known, and find a configuration that effectively improves the performance of matching S against real schemas. To search the huge space of configuration candidates efficiently, we adopt genetic algorithms during the tuning process. Experiments over four real-world domains with two main matching systems demonstrate that our approach provides more qualified matches over different domains.

[1]  Letizia Tanca,et al.  Semantic Web Information Management - A Model-Based Perspective , 2009, Semantic Web Information Management.

[2]  Setsuo Ohsuga,et al.  INTERNATIONAL CONFERENCE ON VERY LARGE DATA BASES , 1977 .

[3]  George A. Miller,et al.  WordNet: A Lexical Database for the English Language , 2002 .

[4]  Schema Matching And Mapping-based Data Integration , 2005 .

[5]  Fausto Giunchiglia,et al.  Semantic Matching with S-Match , 2009, Semantic Web Information Management.

[6]  Arnon Rosenthal,et al.  eTuner: tuning schema matching software using synthetic scenarios , 2007, The VLDB Journal.

[7]  Erhard Rahm,et al.  A survey of approaches to automatic schema matching , 2001, The VLDB Journal.

[8]  Zohra Bellahsene,et al.  YAM: a schema matcher factory , 2009, CIKM.

[9]  Erhard Rahm,et al.  Schema and ontology matching with COMA++ , 2005, SIGMOD '05.

[10]  Erhard Rahm,et al.  Similarity flooding: a versatile graph matching algorithm and its application to schema matching , 2002, Proceedings 18th International Conference on Data Engineering.

[11]  Erhard Rahm,et al.  Quickmig: automatic schema matching for data migration projects , 2007, CIKM '07.

[12]  Darrell Whitley,et al.  A genetic algorithm tutorial , 1994, Statistics and Computing.

[13]  Wang Chiew Tan,et al.  STBenchmark: towards a benchmark for mapping systems , 2008, Proc. VLDB Endow..

[14]  Zohra Bellahsene,et al.  A Flexible Approach for Planning Schema Matching Algorithms , 2008, OTM Conferences.

[15]  Jérôme Euzenat,et al.  A Survey of Schema-Based Matching Approaches , 2005, J. Data Semant..

[16]  Philip A. Bernstein,et al.  Incremental schema matching , 2006, VLDB.

[17]  Philip A. Bernstein,et al.  HAMSTER: Using Search Clicklogs for Schema and Taxonomy Matching , 2009, Proc. VLDB Endow..

[18]  Raymond J. Mooney,et al.  Employing Trainable String Similarity Metrics for Information Integration , 2003, IIWeb.

[19]  Avigdor Gal,et al.  Boosting Schema Matchers , 2008, OTM Conferences.

[20]  Zohra Bellahsene,et al.  XBenchMatch: a Benchmark for XML Schema Matching Tools , 2007, VLDB.