A schema matching-based approach to XML schema clustering

The relationship between XML data clustering and schema matching is bidirectional. On one side, clustering techniques have been adopted to improve matching performance, and on the other side schema matching is the backbone of the clustering technique. This paper presents a new approach for clustering XML schema based on schema matching. In particular, we develop and implement an XML schema matching system, which determines semantic similarities between XML schemas based on the Prüfer sequence representation of schema trees. The proposed computation similarity algorithm makes use of the semantic meaning of XML elements as well as the hierarchical features of XML schemas. The computed similarities are then exploited by an agglomerative clustering algorithm to group similar schemas. Our experimental results show that the proposed approach is fast and accurate in clustering heterogeneous XML schemas.

[1]  Mong-Li Lee,et al.  XClust: clustering XML schemas for effective integration , 2002, CIKM '02.

[2]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[3]  Gad M. Landau,et al.  An Extension of the Vector Space Model for Querying XML Documents via XML Fragments 1 , 2002 .

[4]  Dongwon Lee,et al.  Comparative analysis of six XML schema languages , 2000, SGMD.

[5]  Abdelhamid Bouchachia,et al.  Searching XML Documents - Preliminary Work , 2005, INEX.

[6]  Hyoung-Joo Kim,et al.  A clustering method based on path similarities of XML data , 2007, Data Knowl. Eng..

[7]  Sudha Ram,et al.  Clustering Schema Elements for Semantic Integration of Heterogeneous Data Sources , 2004, J. Database Manag..

[8]  Giovanna Guerrini,et al.  An Overviewof Similarity Measures for Clustering XML Documents , 2007 .

[9]  Shirish Tatikonda,et al.  LCS-TRIM: Dynamic Programming Meets XML Indexing and Querying , 2007, VLDB.

[10]  Richi Nayak,et al.  Fast and effective clustering of XML data using structural information , 2008, Knowledge and Information Systems.

[11]  Timos K. Sellis,et al.  A methodology for clustering XML documents by structure , 2006, Inf. Syst..

[12]  Joachim Hammer,et al.  Element matching across data-oriented XML sources using a multi-strategy clustering model , 2004, Data Knowl. Eng..

[13]  Charu C. Aggarwal,et al.  Xproj: a framework for projected structural clustering of xml documents , 2007, KDD '07.

[14]  Elisa Bertino,et al.  XML and Data Integration , 2001, IEEE Internet Comput..

[15]  Richi Nayak,et al.  XML schema clustering with semantic and hierarchical similarity measures , 2007, Knowl. Based Syst..

[16]  Willem Jonker,et al.  Using Element Clustering to Increase the Efficiency of XML Schema Matching , 2006, 22nd International Conference on Data Engineering Workshops (ICDEW'06).

[17]  David A. Bell,et al.  A Novel Clustering-Based Approach to Schema Matching , 2006, ADVIS.

[18]  Athena Vakali,et al.  Web Data Management Practices: Emerging Techniques and Technologies , 2007 .

[19]  George Karypis,et al.  Evaluation of hierarchical clustering algorithms for document datasets , 2002, CIKM '02.

[20]  Pavel Berkhin,et al.  A Survey of Clustering Data Mining Techniques , 2006, Grouping Multidimensional Data.

[21]  Elisa Bertino,et al.  Measuring the structural similarity among XML documents and DTDs , 2008, Journal of Intelligent Information Systems.

[22]  Erhard Rahm,et al.  Generic Schema Matching with Cupid , 2001, VLDB.

[23]  Gunter Saake,et al.  A New XML Schema Matching Approach Using Prüfer Sequences , 2008, DB&IS.