Tree Mining Application to Matching of Heterogeneous Knowledge Representations

Matching of heterogeneous knowledge sources is of increasing importance in areas such as scientific knowledge management, e-commerce, enterprise application integration, and many emerging Semantic Web applications. With the desire of knowledge sharing and reuse in these fields, it is common that the knowledge coming from different organizations from the same domain is to be matched. We propose a knowledge matching method based on our previously developed tree mining algorithms for extracting frequently occurring subtrees from a tree structured database such as XML. Using the method the common structure among the different representations can be automatically extracted. Our focus is on knowledge matching at the structural level and we use a set of example XML schema documents from the same domain to evaluate the method. We discuss some important issues that arise when applying tree mining algorithms for detection of common document structures. The experiments demonstrate the usefulness of the approach.

[1]  Sudarshan S. Chawathe,et al.  Comparing Hierarchical Data in External Memory , 1999, VLDB.

[2]  Kaizhong Zhang,et al.  Structural matching and discovery in document databases , 1997, SIGMOD '97.

[3]  Tharam S. Dillon,et al.  Mining Substructures in Protein Data , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[4]  Ke Wang,et al.  Discovering Structural Association of Semistructured Data , 2000, IEEE Trans. Knowl. Data Eng..

[5]  Yun Chi,et al.  Frequent Subtree Mining - An Overview , 2004, Fundam. Informaticae.

[6]  Tharam S. Dillon,et al.  IMB3-Miner: Mining Induced/Embedded Subtrees by Constraining the Level of Embedding , 2006, PAKDD.

[7]  Trong Wu,et al.  An accurate computation of the hypergeometric distribution function , 1993, TOMS.

[8]  Tharam S. Dillon,et al.  Razor: mining distance-constrained embedded subtrees , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[9]  Fedja Hadzic,et al.  Implications of frequent subtree mining using hybrid support definitions , 2007 .

[10]  Craig F. Smith,et al.  Thinking on the Web , 2006 .

[11]  Thomas R. Gruber,et al.  A translation approach to portable ontology specifications , 1993, Knowl. Acquis..

[12]  Jennifer Widom,et al.  Change detection in hierarchically structured information , 1996, SIGMOD '96.

[13]  Takahiro Kawamura,et al.  Semantic Matching of Web Services Capabilities , 2002, SEMWEB.

[14]  T. Wu Built-in reliability in the Ada programming language , 1990, IEEE Conference on Aerospace and Electronics.

[15]  Trong Wu,et al.  Granular computing in programming language design , 2005, 2005 IEEE International Conference on Granular Computing.

[16]  Sudipto Guha,et al.  Approximate XML joins , 2002, SIGMOD '02.

[17]  Yun Chi,et al.  Canonical forms for labelled trees and their applications in frequent subtree mining , 2005, Knowledge and Information Systems.

[18]  Jim Welsh,et al.  A comparative study of task communication in ada , 1981, Softw. Pract. Exp..

[19]  Lotfi A. Zadeh,et al.  Toward a theory of fuzzy information granulation and its centrality in human reasoning and fuzzy logic , 1997, Fuzzy Sets Syst..

[20]  Mohammed J. Zaki Efficiently mining frequent trees in a forest: algorithms and applications , 2005, IEEE Transactions on Knowledge and Data Engineering.

[21]  Craig F. Smith,et al.  Thinking on the web - Berners-Lee, Gödel, and Turing , 2006 .

[22]  H. V. Jagadish,et al.  Evaluating Structural Similarity in XML Documents , 2002, WebDB.

[23]  Serge Abiteboul,et al.  Detecting changes in XML documents , 2002, Proceedings 18th International Conference on Data Engineering.

[24]  Kaizhong Zhang,et al.  An Algorithm for Finding the Largest Approximately Common Substructures of Two Trees , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[25]  Tharam S. Dillon,et al.  UNI3 - efficient algorithm for mining unordered induced subtrees using TMG candidate generation , 2007, 2007 IEEE Symposium on Computational Intelligence and Data Mining.

[26]  Asunción Gómez-Pérez,et al.  Ontological Engineering: With Examples from the Areas of Knowledge Management, e-Commerce and the Semantic Web , 2004, Advanced Information and Knowledge Processing.

[27]  Dimitrios Gunopulos,et al.  Constraint-Based Rule Mining in Large, Dense Databases , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[28]  Jos de Bruijn,et al.  Enabling Semantic Web Services: The Web Service Modeling Ontology , 2006 .

[29]  Jos de Bruijn,et al.  Enabling Semantic Web Services , 2007 .