Integration of semistructured data with partial and inconsistent information

Data integration from several sources has gained considerable attention with the recent popularity of the World Wide Web. In the real world, some information may be missing (i.e. partial) and some may be inconsistent from several sources. How to obtain information that is as complete as possible and how to detect inconsistency from these sources is thus an interesting question. Most existing work uses a simple graph-based or tree-based semistructured data model to represent heterogeneous data coming from various sites, which fails to account for the existence of partial and inconsistent information. In this paper, we redefine the notion of semistructured objects to reflect the existence of partial and inconsistent information and study how to integrate such objects spread over various sources and check their consistency in the meantime. We propose a new integration operator for this purpose and discuss its semantic properties.

[1]  Gershon Elber,et al.  WebSuite: A Tool Suite for Harnessing Web Data , 1998, WebDB.

[2]  Dan Suciu,et al.  A query language for a Web-site management system , 1997, SGMD.

[3]  Jennifer Widom,et al.  The TSIMMIS Project: Integration of Heterogeneous Information Sources , 1994, IPSJ.

[4]  Leonid Libkin,et al.  Approximation in Databases , 1995, ICDT.

[5]  Leonid Libkin,et al.  Aspects of partial information in databases , 1995 .

[6]  Alberto O. Mendelzon,et al.  WebOQL: restructuring documents, databases and Webs , 1998, Proceedings 14th International Conference on Data Engineering.

[7]  Leonid Libkin Normalizing incomplete databases , 1995, PODS '95.

[8]  David Konopnicki,et al.  Information gathering in the World-Wide Web: the W3QL query language and the W3QS system , 1998, TODS.

[9]  Alon Y. Levy Obtaining Complete Answers from Incomplete Databases , 1996, VLDB 1996.

[10]  Witold Lipski,et al.  On Databases with Incomplete Information , 1981, JACM.

[11]  Atsushi Ohori Semantics of Types for Database Objects , 1988, ICDT.

[12]  Tomasz Imielinski,et al.  Incomplete Information in Relational Databases , 1984, JACM.

[13]  Jennifer Widom,et al.  The Lorel query language for semistructured data , 1997, International Journal on Digital Libraries.

[14]  Setrag Khoshafian,et al.  A calculus for complex objects , 1985, PODS '86.

[15]  LINDA G. DEMICHIEL,et al.  Resolving Database Incompatibility: An Approach to Performing Relational Operations over Mismatched Domains , 1989, IEEE Trans. Knowl. Data Eng..

[16]  Alon Y. Halevy,et al.  Using Probabilistic Information in Data Integration , 1997, VLDB.

[17]  Koichi Munakata Integration of Semistructured Data Using Outer Joins , 1997 .

[18]  Aaron Watters,et al.  A Semantics for Complex Objects and Approximate Answers , 1991, J. Comput. Syst. Sci..

[19]  Laks V. S. Lakshmanan,et al.  A declarative language for querying and restructuring the Web , 1996, Proceedings RIDE '96. Sixth International Workshop on Research Issues in Data Engineering.

[20]  Dan Suciu,et al.  A query language and optimization techniques for unstructured data , 1996, SIGMOD '96.

[21]  Doug Fang,et al.  The identification and resolution of semantic heterogeneity in multidatabase systems , 1991, [1991] Proceedings. First International Workshop on Interoperability in Multidatabase Systems.

[22]  Shashi Shekhar,et al.  Resolving attribute incompatibility in database integration: an evidential reasoning approach , 1994, Proceedings of 1994 IEEE 10th International Conference on Data Engineering.

[23]  Amihai Motro,et al.  Estimating the Quality of Data in Relational Databases , 1996, IQ.

[24]  Gang Zhou,et al.  A framework for supporting data integration using the materialized and virtual approaches , 1996, SIGMOD '96.

[25]  Scott B. Huffman,et al.  Heuristic Joins to Integrate Structured Hetrogeneous Data , 1995 .

[26]  David Konopnicki,et al.  W3QS: A Query System for the World-Wide Web , 1995, VLDB.

[27]  Charles Elkan,et al.  An Efficient Domain-Independent Algorithm for Detecting Approximately Duplicate Database Records , 1997, DMKD.

[28]  Qiming Chen,et al.  HILOG: A High-Order Logic Programming Language for Non-1NF Deductive Databases , 1989, DOOD.

[29]  Alberto O. Mendelzon,et al.  Querying the World Wide Web , 1996, Fourth International Conference on Parallel and Distributed Information Systems.

[30]  Mengchi Liu,et al.  Relationlog: A Typed Extension to Datalog with Sets and Tuples , 1998, J. Log. Program..

[31]  Joann J. Ordille,et al.  Querying Heterogeneous Information Sources Using Source Descriptions , 1996, VLDB.

[32]  William W. Cohen Integration of heterogeneous databases without common domains using queries based on textual similarity , 1998, SIGMOD '98.

[33]  Michel Goossens,et al.  The LaTeX companion , 1993 .

[34]  Serge Abiteboul,et al.  Querying Semi-Structured Data , 1997, Encyclopedia of Database Systems.

[35]  Mengchi Liu ROL: A Deductive Object Base Language , 1996, Inf. Syst..

[36]  Michael Kifer,et al.  Logical foundations of object-oriented and frame-based languages , 1995, JACM.

[37]  Craig A. Knoblock,et al.  Ariadne: a system for constructing mediators for Internet sources , 1998, SIGMOD '98.

[38]  Leslie Lamport,et al.  LaTeX - A Document Preparation System: User's Guide and Reference Manual, Second Edition , 1994 .

[39]  Leonid Libkin A Relational Algebra for Complex Objects Based on Partial Information , 1991, MFDBS.