Schema Matching and Integration for Data Sharing Among Collaborating Organizations

Schema matching and schema integration are important components of the data sharing infrastructure in Collaborative Networks. In order to achieve more accurate matching and integration results and enhance efficiency, it is required to provide some mechanisms to carry out these processes as automatically as possible. This paper addresses the problems and challenges related to schema matching and schema integration and introduces the Semi-Automatic Schema Matching and INTegration (SASMINT) system to automate these processes. Other systems aiming at database interoperability typically focus either on schema matching or on schema integration. On the other hand, the SASMINT system combines them and uses the results of schema matching for semi-automatic schema integration. SASMINT follows a composite approach in schema matching, which means it combines the results of variety of algorithms, making it a generic tool applicable for different types of schemas. It also proposes a Sampler component for helping the user to assign the weights to algorithms. Furthermore, SASMINT uses an XML-based derivation language to save the results of schema matching and schema integration, and also to define the components of integrated schemas, in order to further support automated query processing against integrated sources.

[1]  Hamideh Afsarmanesh,et al.  Management of Shared Data in Federated Cooperative PEER Environment , 1993, Int. J. Cooperative Inf. Syst..

[2]  Craig A. Knoblock,et al.  Query reformulation for dynamic information integration , 1996, Journal of Intelligent Information Systems.

[3]  Stefano Spaccapietra,et al.  Model independent assertions for integration of heterogeneous schemas , 1992, The VLDB Journal.

[4]  Erhard Rahm,et al.  COMA - A System for Flexible Combination of Schema Matching Approaches , 2002, VLDB.

[5]  Chris Clifton,et al.  SEMINT: A tool for identifying attribute correspondences in heterogeneous databases using neural networks , 2000, Data Knowl. Eng..

[6]  Hamideh Afsarmanesh,et al.  Ecolead: A Holistic Approach to Creation and Management of Dynamic Virtual Organizations , 2005, PRO-VE.

[7]  Gerard Salton,et al.  On the Specification of Term Values in Automatic Indexing , 1973 .

[8]  Erhard Rahm,et al.  A survey of approaches to automatic schema matching , 2001, The VLDB Journal.

[9]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[10]  Hamideh Afsarmanesh,et al.  A Framework for Management of Virtual Organization Breeding Environments , 2005, PRO-VE.

[11]  Patrick Valduriez,et al.  Principles of distributed database systems (2nd ed.) , 1999 .

[12]  Patrick Valduriez,et al.  Principles of Distributed Database Systems , 1990 .

[13]  Paul Van Dooren,et al.  A MEASURE OF SIMILARITY BETWEEN GRAPH VERTICES . WITH APPLICATIONS TO SYNONYM EXTRACTION AND WEB SEARCHING , 2002 .

[14]  P ShethAmit,et al.  Federated database systems for managing distributed, heterogeneous, and autonomous databases , 1990 .

[15]  Domenico Beneventano,et al.  The MOMIS methodology for integrating heterogeneous data sources , 2004, IFIP Congress Topical Sessions.

[16]  Hamideh Afsarmanesh,et al.  SASMINT System for Database Interoperability in Collaborative Networks , 2006, OTM Conferences.

[17]  Erhard Rahm,et al.  Generic Schema Matching with Cupid , 2001, VLDB.

[18]  Michael E. Lesk,et al.  Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone , 1986, SIGDOC '86.

[19]  Hamideh Afsarmanesh,et al.  Using linguistic techniques for schema matching , 2006, ICSOFT.

[20]  Laura M. Haas,et al.  Schema Mapping as Query Discovery , 2000, VLDB.

[21]  Matthew A. Jaro,et al.  Probabilistic linkage of large public health data files. , 1995, Statistics in medicine.

[22]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[23]  P. Jaccard THE DISTRIBUTION OF THE FLORA IN THE ALPINE ZONE.1 , 1912 .

[24]  Martha Palmer,et al.  Verb Semantics and Lexical Selection , 1994, ACL.

[25]  C. Fellbaum An Electronic Lexical Database , 1998 .

[26]  Fausto Giunchiglia,et al.  S-Match: an Algorithm and an Implementation of Semantic Matching , 2004, ESWS.

[27]  Erhard Rahm,et al.  Similarity flooding: a versatile graph matching algorithm and its application to schema matching , 2002, Proceedings 18th International Conference on Data Engineering.

[28]  Zohra Bellahsene,et al.  PORSCHE: Performance ORiented SCHEma mediation , 2008, Inf. Syst..

[29]  Erhard Rahm,et al.  Comparison of Schema Matching Evaluations , 2002, Web, Web-Services, and Database Systems.

[30]  Hamideh Afsarmanesh,et al.  Interoperability In Collaborative Network Of Biodiversity Organizations , 2006, PRO-VE.

[31]  Erhard Rahm,et al.  Schema and ontology matching with COMA++ , 2005, SIGMOD '05.

[32]  Charles Elkan,et al.  The Field Matching Problem: Algorithms and Applications , 1996, KDD.