QoM: Qualitative and Quantitative Schema Match Measure

Integration of multiple heterogeneous data sources continues to be a critical problem for many application domains and a challenge for researchers world-wide. Schema matching, a fundamental aspect of integration, has been a well-studied problem. However researchers have, for the most part, concentrated on the development of different schema matching algorithms, and their performance with respect to the number of matches produced. To the best of our knowledge, current research in schema matching does not address the issue of quality of matching. We believe that quality of match is an important measure that can not only provide a basis for comparing multiple matches, but can also be used as a metric to compare as well as optimize existing match algorithms. In this paper, we define the Quality of Match (QoM) metric, and provide qualitative and quantitative analysis techniques to evaluate the QoM of two given schemata. In particular, we introduce a taxonomy of schema matches as a qualitative analysis technique, and a weight-based match model that in concert with the taxonomy provides a quantitative measure of the QoM. We show, via examples, how QoM can be used to distinguish the “goodness” of one match in comparison with other matches.

[1]  Jeannette M. Wing,et al.  Signature matching: a tool for using software libraries , 1995, TSEM.

[2]  Jeannette M. Wing,et al.  Specification matching of software components , 1997 .

[3]  Veda C. Storey,et al.  Conceptual Modeling — ER 2000 , 2003, Lecture Notes in Computer Science.

[4]  Amihai Motro,et al.  Autoplex: Automated Discovery of Content for Virtual Databases , 2001, CoopIS.

[5]  Laura M. Haas,et al.  Transforming Heterogeneous Data with Database Middleware: Beyond Integration , 1999, IEEE Data Eng. Bull..

[6]  James E. Rumbaugh,et al.  Object-oriented analysis and design (OOAD) , 2003 .

[7]  Renée J. Miller,et al.  The Use of Information Capacity in Schema Integration and Translation , 1993, VLDB.

[8]  Silvana Castano,et al.  Semantic integration of heterogeneous information sources , 2001, Data Knowl. Eng..

[9]  Ali R. Hurson,et al.  Automated resolution of semantic heterogeneity in multidatabases , 1994, TODS.

[10]  Erhard Rahm,et al.  COMA - A System for Flexible Combination of Schema Matching Approaches , 2002, VLDB.

[11]  Pedro M. Domingos,et al.  Reconciling schemas of disparate data sources: a machine-learning approach , 2001, SIGMOD '01.

[12]  Erhard Rahm,et al.  Generic Schema Matching with Cupid , 2001, VLDB.

[13]  Rainer Eckstein,et al.  XML Conceptual Modeling Using UML , 2000, ER.

[14]  B. Cheng,et al.  Specification matching for software reuse: a foundation , 1995, SSR '95.

[15]  Jeannette M. Wing,et al.  Specification matching of software components , 1995, TSEM.

[16]  Arnon Rosenthal,et al.  Theoretically Sound Transformations for Practical Database Design , 1987, ER.

[17]  James Martin,et al.  Object-oriented analysis and design , 1992 .