Top-k generation of integrated schemas based on directed and weighted correspondences

Schema integration is the problem of creating a unified target schema based on a set of existing source schemas and based on a set of correspondences that are the result of matching the source schemas. Previous methods for schema integration rely on the exploration, implicit or explicit, of the multiple design choices that are possible for the integrated schema. Such exploration relies heavily on user interaction; thus, it is time consuming and labor intensive. Furthermore, previous methods have ignored the additional information that typically results from the schema matching process, that is, the weights and in some cases the directions that are associated with the correspondences. In this paper, we propose a more automatic approach to schema integration that is based on the use of directed and weighted correspondences between the concepts that appear in the source schemas. A key component of our approach is a novel top-k ranking algorithm for the automatic generation of the best candidate schemas. The algorithm gives more weight to schemas that combine the concepts with higher similarity or coverage. Thus, the algorithm makes certain decisions that otherwise would likely be taken by a human expert. We show that the algorithm runs in polynomial time and moreover has good performance in practice.

[1]  J. Munkres ALGORITHMS FOR THE ASSIGNMENT AND TRANSIORTATION tROBLEMS* , 1957 .

[2]  Anthony Kosky,et al.  Theoretical Aspects of Schema Merging , 1992, EDBT.

[3]  Daniel P. Huttenlocher,et al.  Comparing Images Using the Hausdorff Distance , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Renée J. Miller,et al.  The Use of Information Capacity in Schema Integration and Translation , 1993, VLDB.

[5]  Paul Brown,et al.  Toward Automated Large-Scale Information Integration and Discovery , 2005, Data Management in a Connected World.

[6]  Stefano Spaccapietra,et al.  View Integration: A Step Forward in Solving Structural Conflicts , 1994, IEEE Trans. Knowl. Data Eng..

[7]  Erhard Rahm,et al.  Similarity flooding: a versatile graph matching algorithm and its application to schema matching , 2002, Proceedings 18th International Conference on Data Engineering.

[8]  Carolyn Begg Thomas Connolly,et al.  Database Systems: A Practical Approach To Design, , 2004 .

[9]  Lucian Popa,et al.  BioFederator: A Data Federation System for Bioinformatics on the Web , 2007 .

[10]  Mark A. Musen,et al.  PROMPT: Algorithm and Tool for Automated Ontology Merging and Alignment , 2000, AAAI/IAAI.

[11]  Gerd Stumme,et al.  FCA-MERGE: Bottom-Up Merging of Ontologies , 2001, IJCAI.

[12]  Stefan Friedrich,et al.  Topology , 2019, Arch. Formal Proofs.

[13]  Phokion G. Kolaitis,et al.  Interactive generation of integrated schemas , 2008, SIGMOD Conference.

[14]  Renée J. Miller,et al.  Leveraging data and structure in ontology integration , 2007, SIGMOD '07.

[15]  K. G. Murty An Algorithm for Ranking All the Assignment in Order of Increasing Cost , 1968 .

[16]  Avigdor Gal,et al.  Managing Uncertainty in Schema Matching with Top-K Schema Mappings , 2006, J. Data Semant..

[17]  Anil K. Jain,et al.  A modified Hausdorff distance for object matching , 1994, Proceedings of 12th International Conference on Pattern Recognition.

[18]  Philip A. Bernstein,et al.  Schema merging and mapping creation for relational sources , 2008, EDBT '08.

[19]  Carolyn E. Begg,et al.  Database Systems: A Practical Approach to Design, Implementation and Management , 1998 .

[20]  Ronald Fagin,et al.  Translating Web Data , 2002, VLDB.

[21]  Philip A. Bernstein,et al.  Merging Models Based on Given Correspondences , 2003, VLDB.

[22]  Erhard Rahm,et al.  A survey of approaches to automatic schema matching , 2001, The VLDB Journal.

[23]  M. Queyranne,et al.  K best solutions to combinatorial optimization problems , 1985 .