Resolving Structural Conflicts in the Integration of XML Schemas: A Semantic Approach

While the Internet has facilitated access to information sources, the task of scalable integration of these heterogeneous data sources remains a challenge. The adoption of the eXtensible Markup Language (XML) as the standard for data representation and exchange has led to an increasing number of XML data sources, both native and non-native. Recent integration work has mainly focused on developing matching techniques to find equivalent elements and attributes among the different XML sources. In this paper, we introduce a semantic approach to resolve structural conflicts in the integration of XML schemas. We employ a data model called the ORA-SS (Object-Relationship-Attribute Model for Semi-Structured Data) to capture the implicit semantics in an XML schema. We present a comprehensive algorithm to integrate XML schemas. Compared to existing methods, our algorithm adopts an n-nary integration strategy that takes into account the data semantics, importance of a source, and how the majority of the sources model their data when resolving structural conflicts such as attribute/object class conflict and ancestor-descendant conflict. Further, redundant object classes and transitive relationship sets are removed to obtain a more concise integrated schema.

[1]  M. Lee,et al.  ORA-SS: An Object-Relationship-Attribute Model for Semi-structured Data , 2000 .

[2]  Tok Wang Ling,et al.  Relational to entity-relationship schema translation using semantic and inclusion dependencies , 1995 .

[3]  Tok Wang Ling,et al.  Resolving Structural Conflicts in the Integration of Entity Relationship Schemas , 1995, OOER.

[4]  Chun-Nan Hsu,et al.  Induction of integrated view for XML data with heterogeneous DTDs , 2001, CIKM '01.

[5]  Silvana Castano,et al.  An XML-based Integration Scheme for Web Data Sources , 2001, Ingénierie des Systèmes d Inf..

[6]  Tok Wang Ling,et al.  Resolving Constraint Conflicts in the Integration of Entity-Relationship Schemas , 1997, ER.

[7]  Mong-Li Lee,et al.  XClust: clustering XML schemas for effective integration , 2002, CIKM '02.

[8]  Chantal Reynaud,et al.  Semantic integration of XML heterogeneous data sources , 2001, Proceedings 2001 International Database Engineering and Applications Symposium.

[9]  David Maier,et al.  The Theory of Relational Databases , 1983 .

[10]  Akhil Kumar,et al.  A dynamic warehouse for XML Data of the Web. , 2001 .

[11]  Felix Naumann,et al.  Quality-driven Integration of Heterogenous Information Systems , 1999, VLDB.

[12]  Tok Wang Ling,et al.  Designing Functional Dependencies for XML , 2002, EDBT.

[13]  Erhard Rahm,et al.  On Matching Schemas Automatically , 2001 .

[14]  Erhard Rahm,et al.  Generic Schema Matching with Cupid , 2001, VLDB.

[15]  Wenfei Fan,et al.  Keys for XML , 2002, Comput. Networks.

[16]  Insup Lee,et al.  CONCUR '95: Concurrency Theory , 1995, Lecture Notes in Computer Science.

[17]  Prasenjit Mitra,et al.  Semi-automatic Integration of Knowledge Sources , 1999 .

[18]  Pedro M. Domingos,et al.  Learning Source Descriptions for Data Integration , 2000 .

[19]  Tok Wang Ling,et al.  Designing Valid XML Views , 2002, ER.

[20]  Martin L. Kersten,et al.  A Graph-Oriented Model for Articulation of Ontology Interdependencies , 1999, EDBT.

[21]  Tok Wang Ling,et al.  Translating Relational Schema With Constraints Into OODB Schema , 1992, DS-5.

[22]  Silvana Castano,et al.  A Method for the Unification of XML Schemata , 2002, Inf. Softw. Technol..

[23]  Pedro M. Domingos,et al.  Learning Source Description for Data Integration , 2000, WebDB.