Ontology-Driven Semantic Matches between Database Schemas

Schema matching has been historically difficult to automate. Most previous studies have tried to find matches by exploiting information on schema and data instances. However, schema and data instances cannot fully capture the semantic information of the databases. Therefore, some attributes can be matched to improper attributes. To address this problem, we propose a schema matching framework that supports identification of the correct matches by extracting the semantics from ontologies. In ontologies, two concepts share similar semantics in their common parent. In addition, the parent can be further used to quantify a similarity between them. By combining this idea with effective contemporary mapping algorithms, we perform an ontology-driven semantic matching in multiple data sources. Experimental results indicate that the proposed method successfully identifies higher accurate matches than those of previous works.

[1]  AnHai Doan,et al.  Corpus-based schema matching , 2005, 21st International Conference on Data Engineering (ICDE'05).

[2]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[3]  Pedro M. Domingos,et al.  Learning to match ontologies on the Semantic Web , 2003, The VLDB Journal.

[4]  Pedro M. Domingos,et al.  iMAP: discovering complex semantic matches between database schemas , 2004, SIGMOD '04.

[5]  Michael Collins,et al.  Three Generative, Lexicalised Models for Statistical Parsing , 1997, ACL.

[6]  Roger Barga,et al.  Proceedings of the 22nd International Conference on Data Engineering Workshops, ICDE 2006, 3-7 April 2006, Atlanta, GA, USA , 2006, ICDE Workshops.

[7]  Jeffrey F. Naughton,et al.  On schema matching with opaque column names and data values , 2003, SIGMOD '03.

[8]  Michael Collins,et al.  A New Statistical Parser Based on Bigram Lexical Dependencies , 1996, ACL.

[9]  Pedro M. Domingos,et al.  Reconciling schemas of disparate data sources: a machine-learning approach , 2001, SIGMOD '01.

[10]  Balakrishnan Chandrasekaran,et al.  What are ontologies, and why do we need them? , 1999, IEEE Intell. Syst..

[11]  Vladimir I. Levenshtein,et al.  On the Minimal Redundancy of Binary Error-Correcting Codes , 1975, Inf. Control..

[12]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[13]  Chris Clifton,et al.  SEMINT: A tool for identifying attribute correspondences in heterogeneous databases using neural networks , 2000, Data Knowl. Eng..

[14]  Ted Pedersen,et al.  WordNet::Similarity - Measuring the Relatedness of Concepts , 2004, NAACL.

[15]  AnHai Doan,et al.  iMAP: Discovering Complex Mappings between Database Schemas. , 2004, SIGMOD 2004.

[16]  Thomas Bäck,et al.  An Overview of Evolutionary Algorithms for Parameter Optimization , 1993, Evolutionary Computation.

[17]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[18]  Philip Resnik,et al.  Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language , 1999, J. Artif. Intell. Res..