Schema mapping verification: the spicy way

Schema mapping algorithms rely on value correspondences - i.e., correspondences among semantically related attributes - to produce complex transformations among data sources. These correspondences are either manually specified or suggested by separate modules called schema matchers. The quality of mappings produced by a mapping generation tool strongly depends on the quality of the input correspondences. In this paper, we introduce the Spicy system, a novel approach to the problem of verifying the quality of mappings. Spicy is based on a three-layer architecture, in which a schema matching module is used to provide input to a mapping generation module. Then, a third module, the mapping verification module, is used to check candidate mappings and choose the ones that represent better transformations of the source into the target. At the core of the system stands a new technique for comparing the structure and actual content of trees, called structural analysis. Experimental results show that, by carefully designing the comparison algorithm, it is possible to achieve both good scalability and high precision in mapping selection.

[1]  Peter G. Doyle,et al.  Random Walks and Electric Networks: REFERENCES , 1987 .

[2]  Chris Clifton,et al.  SEMINT: A tool for identifying attribute correspondences in heterogeneous databases using neural networks , 2000, Data Knowl. Eng..

[3]  Serge Abiteboul,et al.  Foundations of Databases , 1994 .

[4]  Laura M. Haas,et al.  Data-driven understanding and refinement of schema mappings , 2001, SIGMOD '01.

[5]  Masud Mansuripur,et al.  Introduction to information theory , 1986 .

[6]  Weifeng Su,et al.  Holistic Schema Matching for Web Query Interfaces , 2006, EDBT.

[7]  Erhard Rahm,et al.  A survey of approaches to automatic schema matching , 2001, The VLDB Journal.

[8]  Felix Naumann,et al.  Schema matching using duplicates , 2005, 21st International Conference on Data Engineering (ICDE'05).

[9]  Erhard Rahm,et al.  Comparison of Schema Matching Evaluations , 2002, Web, Web-Services, and Database Systems.

[10]  Paolo Papotti,et al.  Nested mappings: schema mapping reloaded , 2006, VLDB.

[11]  Vadim V. Anshelevich,et al.  A hierarchical approach to computer Hex , 2002, Artif. Intell..

[12]  Clayton R. Paul,et al.  Fundamentals of Electric Circuit Analysis , 2000 .

[13]  Avigdor Gal,et al.  Managing Uncertainty in Schema Matching with Top-K Schema Mappings , 2006, J. Data Semant..

[14]  Qian Ying Discovering Complex Semantic Matches Between Database Schemas , 2008 .

[15]  Erhard Rahm,et al.  COMA - A System for Flexible Combination of Schema Matching Approaches , 2002, VLDB.

[16]  Jérôme Euzenat,et al.  A Survey of Schema-Based Matching Approaches , 2005, J. Data Semant..

[17]  Wenfei Fan,et al.  Putting context into schema matching , 2006, VLDB.

[18]  Ronald Fagin,et al.  Translating Web Data , 2002, VLDB.

[19]  Laura M. Haas,et al.  Clio grows up: from research prototype to industrial tool , 2005, SIGMOD '05.

[20]  Felix Naumann,et al.  Attribute classification using feature analysis , 2002, Proceedings 18th International Conference on Data Engineering.

[21]  Christos Faloutsos,et al.  Electricity Based External Similarity of Categorical Attributes , 2003, PAKDD.

[22]  Christos Faloutsos,et al.  Indexing multimedia databases , 1995, SIGMOD '95.

[23]  Laura M. Haas,et al.  Schema Mapping as Query Discovery , 2000, VLDB.

[24]  Fazlollah M. Reza,et al.  Introduction to Information Theory , 2004, Lecture Notes in Electrical Engineering.

[25]  Tova Milo,et al.  Using Schema Matching to Simplify Heterogeneous Data Translation , 1998, VLDB.

[26]  Jeffrey F. Naughton,et al.  On schema matching with opaque column names and data values , 2003, SIGMOD '03.

[27]  Pedro M. Domingos,et al.  Reconciling schemas of disparate data sources: a machine-learning approach , 2001, SIGMOD '01.

[28]  Avigdor Gal,et al.  Why is schema matching tough and what can we do about it? , 2006, SGMD.

[29]  Philip A. Bernstein,et al.  Model management 2.0: manipulating richer mappings , 2007, SIGMOD '07.

[30]  Erhard Rahm,et al.  Similarity flooding: a versatile graph matching algorithm and its application to schema matching , 2002, Proceedings 18th International Conference on Data Engineering.

[31]  Erhard Rahm,et al.  Schema and ontology matching with COMA++ , 2005, SIGMOD '05.

[32]  Pedro M. Domingos,et al.  iMAP: discovering complex semantic matches between database schemas , 2004, SIGMOD '04.

[33]  Avigdor Gal,et al.  The Generation Y of XML Schema Matching Panel Description , 2007, XSym.

[34]  Wang Chiew Tan,et al.  Debugging schema mappings with routes , 2006, VLDB.

[35]  Laks V. S. Lakshmanan,et al.  HePToX: Marrying XML and Heterogeneity in Your P2P Databases , 2005, VLDB.