Title Managing uncertainty of XML schema matching

Despite of advances in machine learning technologies, a schema matching result between two database schemas (e.g., those derived from COMA++) is likely to be imprecise. In particular, numerous instances of “possible mappings” between the schemas may be derived from the matching result. In this paper, we study the problem of managing possible mappings between two heterogeneous XML schemas. We observe that for XML schemas, their possible mappings have a high degree of overlap. We hence propose a novel data structure, called the block tree, to capture the commonalities among possible mappings. The block tree is useful for representing the possible mappings in a compact manner, and can be generated efficiently. Moreover, it supports the evaluation of probabilistic twig query (PTQ), which returns the probability of portions of an XML document that match the query pattern. For users who are interested only in answers with k-highest probabilities, we also propose the top-k PTQ, and present an efficient solution for it. The second challenge we have tackled is to efficiently generate possible mappings for a given schema matching. While this problem can be solved by existing algorithms, we show how to improve the performance of the solution by using a divide-andconquer approach. An extensive evaluation on realistic datasets show that our approaches significantly improve the efficiency of generating, storing, and querying possible mappings.

[1]  Yehoshua Sagiv,et al.  Query efficiency in probabilistic XML models , 2008, SIGMOD Conference.

[2]  Jeffrey Xu Yu,et al.  TwigList : Make Twig Pattern Matching Fast , 2007, DASFAA.

[3]  Avigdor Gal,et al.  Managing Uncertainty in Schema Matching with Top-K Schema Mappings , 2006, J. Data Semant..

[4]  Paolo Papotti,et al.  Nested mappings: schema mapping reloaded , 2006, VLDB.

[5]  Cong Yu,et al.  Constraint-based XML query rewriting for data integration , 2004, SIGMOD '04.

[6]  Erhard Rahm,et al.  COMA - A System for Flexible Combination of Schema Matching Approaches , 2002, VLDB.

[7]  Jens Dittrich,et al.  iTrails: Pay-as-you-go Information Integration in Dataspaces , 2007, VLDB.

[8]  Jignesh M. Patel,et al.  Structural joins: a primitive for efficient XML query pattern matching , 2002, Proceedings 18th International Conference on Data Engineering.

[9]  Erhard Rahm,et al.  A survey of approaches to automatic schema matching , 2001, The VLDB Journal.

[10]  Philip A. Bernstein,et al.  Model management 2.0: manipulating richer mappings , 2007, SIGMOD '07.

[11]  Ioana Manolescu,et al.  Structured Materialized Views for XML Queries , 2007, VLDB.

[12]  Reynold Cheng,et al.  Managing uncertainty of XML schema matching , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[13]  Paolo Papotti,et al.  Clip: a tool for mapping hierarchical schemas , 2008, SIGMOD Conference.

[14]  Renée J. Miller,et al.  Muse: a system for understanding and designing mappings , 2008, SIGMOD Conference.

[15]  K. G. Murty An Algorithm for Ranking All the Assignment in Order of Increasing Cost , 1968 .

[16]  V. S. Subrahmanian,et al.  Aggregate Query Answering under Uncertain Schema Mappings , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[17]  Alon Y. Halevy,et al.  Data integration with uncertainty , 2007, The VLDB Journal.

[18]  Carmel Domshlak,et al.  Providing Top-K Alternative Schema Matchings with , 2008, ER.

[19]  João C. N. Clímaco,et al.  A note on a new variant of Murty’s ranking assignments algorithm , 2003, 4OR.

[20]  Maurizio Lenzerini,et al.  Data integration: a theoretical perspective , 2002, PODS.

[21]  Marcelo Arenas,et al.  XML data exchange: consistency and query answering , 2005, PODS '05.

[22]  Alon Y. Halevy,et al.  Bootstrapping pay-as-you-go data integration systems , 2008, SIGMOD Conference.