Schema Mapping with Quality Assurance for Data Integration

With the popularity of the internet, more and more data are generated on internet. Because of the usability of Extensible Markup Language(XML for short), more data is organized by XML document format. Because of the flexibility of XML, data organized by XML have a variety of organizational formats which brings a lot of inconvenience to data management. In particular, when the large-scale data operations are performed on XML data, for example data integration, model change, and so on, there are many problems. One of the current implementations is to use Data Exchange to carry out the above operations. The works of predecessors mainly are to analyze the characteristics of Schema Mapping on XML, and institute Data Exchange rules. These rules only consider the data integrity, reliability, but don't consider the quality of the data after conversion. This paper proposes the concept of quality assurance mechanisms. Firstly we discuss that a new model with quality assurance, and provide a suitable method for this model. Then we propose the strategy of weak branch's convergence on the basis of Schema. In the end theoretical analysis and experimental results show that the method is correct and feasible.

[1]  Ronald Rousseau,et al.  Similarity measures in scientometric research: The Jaccard index versus Salton's cosine formula , 1989, Inf. Process. Manag..

[2]  Surajit Chaudhuri,et al.  Learning String Transformations From Examples , 2009, Proc. VLDB Endow..

[3]  Peter N. Yianilos,et al.  Learning String-Edit Distance , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Ronald Fagin,et al.  Composing schema mappings: second-order dependencies to the rescue , 2004, PODS 2004.

[5]  Andrei Z. Broder,et al.  On the resemblance and containment of documents , 1997, Proceedings. Compression and Complexity of SEQUENCES 1997 (Cat. No.97TB100171).

[6]  Jayant Madhavan,et al.  Composing Mappings Among Data Sources , 2003, VLDB.

[7]  Laura M. Haas,et al.  The Clio project: managing heterogeneity , 2001, SGMD.

[8]  Philip A. Bernstein,et al.  Implementing mapping composition , 2007, The VLDB Journal.

[9]  Udi Manber,et al.  Finding Similar Files in a Large File System , 1994, USENIX Winter.

[10]  Feng Ye,et al.  Mapping XML DTD to Relational Schema , 2009, 2009 First International Workshop on Database Technology and Applications.

[11]  Ronald Fagin,et al.  Translating Web Data , 2002, VLDB.

[12]  Shazia Wasim Sadiq,et al.  Sampling dirty data for matching attributes , 2010, SIGMOD Conference.

[13]  Philip A. Bernstein,et al.  Model management 2.0: manipulating richer mappings , 2007, SIGMOD '07.

[14]  Felix Naumann,et al.  DogmatiX tracks down duplicates in XML , 2005, SIGMOD '05.

[15]  Chuan Yi Tang,et al.  A New Measure of Edit Distance between Labeled Trees , 2001, COCOON.

[16]  Shiwei Tang,et al.  Propagating Functional Dependencies from Relational Schema to XML Schema Using Path Mapping Rules , 2007, International Conference on Internet Computing.

[17]  Phokion G. Kolaitis Schema mappings, data exchange, and metadata management , 2005, PODS.

[18]  Jianxin Li,et al.  Holistic Constraint-Preserving Transformation from Relational Schema into XML Schema , 2008, DASFAA.

[19]  Shiyong Lu,et al.  A New Inlining Algorithm for Mapping XML DTDs to Relational Schemas , 2003, ER.

[20]  Andrew B. Whinston,et al.  Model management , 1994 .

[21]  Theodore Johnson,et al.  Mining database structure; or, how to build a data quality browser , 2002, SIGMOD '02.

[22]  Charles F. Hockett,et al.  A mathematical theory of communication , 1948, MOCO.

[23]  Bin Wang,et al.  VGRAM: Improving Performance of Approximate Queries on String Collections Using Variable-Length Grams , 2007, VLDB.

[24]  Tiziana Catarci,et al.  Structure-aware XML Object Identification , 2006, IEEE Data Eng. Bull..

[25]  Wang Chiew Tan,et al.  Debugging schema mappings with routes , 2006, VLDB.

[26]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.