An Optimization Approach for Semantic-based XML Schema Matching

We propose a novel solution for semantic-based XML schema matching, taking a mathematical programming approach. This method identifies the globally optimal solution for the problem of matching leaf nodes between two XML schema trees by reducing the tree-to-tree matching problem to simpler problems of path-to-path, node-to-node, and word-to-word matching. We formulate these matching problems as maximum-weighted bipartite graph matching problems with different constraints, which are solved by different mathematical programming techniques, including integer programming and dynamic programming. Solutions to simpler problems provide weights for the next stage until the optimal tree-to-tree matching solution is obtained. The effectiveness of this approach has been verified and demonstrated by computer experiments.

[1]  Pedro M. Domingos,et al.  Reconciling schemas of disparate data sources: a machine-learning approach , 2001, SIGMOD '01.

[2]  Jerry Zeyu Gao,et al.  Business-to-Business E-Commerce Frameworks , 2000, Computer.

[3]  Jaewook Kim,et al.  A layered approach to semantic similarity analysis of XML schemas , 2008, 2008 IEEE International Conference on Information Reuse and Integration.

[4]  Fang Wu,et al.  A New Measure of Word Semantic Similarity Based on WordNet Hierarchy and DAG Theory , 2009, 2009 International Conference on Web Information Systems and Mining.

[5]  Aida Boukottaya,et al.  Schema matching for transforming structured documents , 2005, DocEng '05.

[6]  Kathryn Fraughnaugh,et al.  Introduction to graph theory , 1973, Mathematical Gazette.

[7]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[8]  Tova Milo,et al.  Using Schema Matching to Simplify Heterogeneous Data Translation , 1998, VLDB.

[9]  Paavo Kotinurmi,et al.  A review of XML-based supply-chain integration , 2004 .

[10]  James D. Robinson,et al.  Incidental Health Information Use on the Internet , 2009, Health communication.

[11]  Christoph Bussler,et al.  Semantic B2B integration , 2001, SIGMOD '01.

[12]  Richard W. Hamming,et al.  Error detecting and error correcting codes , 1950 .

[13]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[14]  Buhwan Jeong,et al.  Machine-Learning based Semantic Similarity Measures to Assist Discovery and Reuse of Data Exchange XML Schema , 2005 .

[15]  Ralph Weischedel,et al.  PERFORMANCE MEASURES FOR INFORMATION EXTRACTION , 2007 .

[16]  Julian R. Ullmann,et al.  A Binary n-Gram Technique for Automatic Correction of Substitution, Deletion, Insertion and Reversal Errors in Words , 1977, Comput. J..

[17]  Thomas M. Cover,et al.  Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing) , 2006 .

[18]  Erhard Rahm,et al.  Generic Schema Matching with Cupid , 2001, VLDB.

[19]  D. West Introduction to Graph Theory , 1995 .

[20]  Barbara Lerner,et al.  A model for compound type changes encountered in schema evolution , 2000, TODS.

[21]  Erkki Sutinen,et al.  On Using q-Gram Locations in Approximate String Matching , 1995, ESA.

[22]  Fausto Giunchiglia,et al.  S-Match: an Algorithm and an Implementation of Semantic Matching , 2004, ESWS.

[23]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[24]  Erhard Rahm,et al.  Similarity flooding: a versatile graph matching algorithm and its application to schema matching , 2002, Proceedings 18th International Conference on Data Engineering.

[25]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[26]  K. Elster Modern mathematical methods of optimization , 1993 .

[27]  N. S. Mendelsohn,et al.  Coverings of Bipartite Graphs , 1958, Canadian Journal of Mathematics.

[28]  Mark A. Musen,et al.  Anchor-PROMPT: Using Non-Local Context for Semantic Matching , 2001, OIS@IJCAI.

[29]  Ravi Kalakota,et al.  e-Business: Roadmap for Success , 1999 .

[30]  Jérôme Euzenat,et al.  A Survey of Schema-Based Matching Approaches , 2005, J. Data Semant..

[31]  Roy Rada,et al.  Development and application of a metric on semantic nets , 1989, IEEE Trans. Syst. Man Cybern..

[32]  Avigdor Gal,et al.  Why is schema matching tough and what can we do about it? , 2006, SGMD.

[33]  Esko Ukkonen,et al.  Approximate String Matching with q-grams and Maximal Matches , 1992, Theor. Comput. Sci..

[34]  Christos H. Papadimitriou,et al.  On the complexity of integer programming , 1981, JACM.

[35]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[36]  P. Sneath The application of computers to taxonomy. , 1957, Journal of general microbiology.

[37]  Erhard Rahm,et al.  Schema and ontology matching with COMA++ , 2005, SIGMOD '05.

[38]  Mong-Li Lee,et al.  XClust: clustering XML schemas for effective integration , 2002, CIKM '02.

[39]  Erhard Rahm,et al.  A survey of approaches to automatic schema matching , 2001, The VLDB Journal.

[40]  Anne H. H. Ngu,et al.  Business-to-business interactions: issues and enabling technologies , 2003, The VLDB Journal.

[41]  David M. W. Powers,et al.  Measuring Semantic Similarity in the Taxonomy of WordNet , 2005, ACSC.

[42]  Gad M. Landau,et al.  An Extension of the Vector Space Model for Querying XML Documents via XML Fragments 1 , 2002 .

[43]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[44]  Huynh Quyet Thang,et al.  XML Schema Automatic Matching Solution , 2010 .