Schema matching, the problem of finding semantic correspondences between elements of two schemas, plays a key role in many applications, such as data warehouse, heterogeneous data sources integration and semantic Web. The existing approaches to automating schema matching almost focus on computing direct element matches (1:1 matches) between two schemas. However, relationships between real-world schemas involve many complex matches besides 1:1 matches. At present, there are few methods can discover complex matches, such as iMAP, but most of them have poor matching efficiency, because the candidate match space is so large which they need searching. A complex schema matching system called CSM is introduced in this paper. Firstly it can filter unreasonable matches on data types and values by preprocessor and clustering processor; then it employs a set of special-purpose searchers in match generator to explore a specialized portion of the search space and discovers 1:1 and complex matches; Finally it estimates candidate matches and selects optimal candidate matches by using similarity estimator and match selector respectively. Experiments show that, CSM does not only discover matches between schemas roundly, but also improve the matching recall and precision in practice.
[1]
AnHai Doan,et al.
iMAP: Discovering Complex Mappings between Database Schemas.
,
2004,
SIGMOD 2004.
[2]
AnHai Doan,et al.
Corpus-based schema matching
,
2005,
21st International Conference on Data Engineering (ICDE'05).
[3]
Chris Clifton,et al.
SEMINT: A tool for identifying attribute correspondences in heterogeneous databases using neural networks
,
2000,
Data Knowl. Eng..
[4]
Pedro M. Domingos,et al.
iMAP: discovering complex semantic matches between database schemas
,
2004,
SIGMOD '04.
[5]
Erhard Rahm,et al.
A survey of approaches to automatic schema matching
,
2001,
The VLDB Journal.
[6]
Erhard Rahm,et al.
Generic Schema Matching with Cupid
,
2001,
VLDB.
[7]
Erhard Rahm,et al.
COMA - A System for Flexible Combination of Schema Matching Approaches
,
2002,
VLDB.