Critical Points for Interactive Schema Matching

Experience suggests that fully automated schema matching is infeasible, especially for n-to-m matches involving semantic functions. It is therefore advisable for a matching algorithm not only to do as much as possible automatically, but also to accurately identify the critical points where user input is maximally useful. Our matching algorithm combines several existing approaches, with a new emphasis on using the context provided by the way elements are embedded in paths. A prototype tested on biological data (gene sequence, DNA, RNA, etc.) and on bibliographic data, shows significant performance improvements from user feedback and context checking. In non-interactive mode on the purchase order schemas, it compares favorably with COMA, the most mature schema matching system in literature, and also correctly identifies critical points for user input.

[1]  Joachim Biskup,et al.  Extracting information from heterogeneous information sources using ontologically specified target views , 2003, Inf. Syst..

[2]  Pedro M. Domingos,et al.  Reconciling schemas of disparate data sources: a machine-learning approach , 2001, SIGMOD '01.

[3]  Chris Clifton,et al.  SEMINT: A tool for identifying attribute correspondences in heterogeneous databases using neural networks , 2000, Data Knowl. Eng..

[4]  Erhard Rahm,et al.  COMA - A System for Flexible Combination of Schema Matching Approaches , 2002, VLDB.

[5]  Erhard Rahm,et al.  Comparison of Schema Matching Evaluations , 2002, Web, Web-Services, and Database Systems.

[6]  Erhard Rahm,et al.  Rondo: a programming platform for generic model management , 2003, SIGMOD '03.

[7]  Erhard Rahm,et al.  Generic Schema Matching with Cupid , 2001, VLDB.

[8]  David W. Embley,et al.  Using Domain Ontologies to Discover Direct and Indirect Matches for Schema Elements , 2003 .

[9]  Kevin Chen-Chuan Chang,et al.  Statistical schema matching across web query interfaces , 2003, SIGMOD '03.

[10]  Pedro M. Domingos,et al.  Representing and reasoning about mappings between domain models , 2002, AAAI/IAAI.

[11]  Joseph A. Goguen,et al.  A Metadata Integration Assistant Generator for Heterogeneous Distributed Databases , 2002, OTM.

[12]  Erhard Rahm,et al.  Similarity flooding: a versatile graph matching algorithm and its application to schema matching , 2002, Proceedings 18th International Conference on Data Engineering.

[13]  Tova Milo,et al.  Using Schema Matching to Simplify Heterogeneous Data Translation , 1998, VLDB.

[14]  Erhard Rahm,et al.  On Matching Schemas Automatically , 2001 .

[15]  Laura M. Haas,et al.  Schema Mapping as Query Discovery , 2000, VLDB.