Clustering XML Documents Based on the Weight of Frequent Structures

The previous clustering methods of XML document group XML documents with similar structures, measuring structural similarity and distance between XML documents. In this paper, however, we propose a novel clustering method for XML documents using the weight of frequent structures in XML documents, considering that an XML document as a transaction and the extracted structures from XML documents as items of a transaction. Our experiment results show the high speed and cluster cohesion of our clustering method.

[1]  Takeo Kanade,et al.  Visual Tracking of High DOF Articulated Structures: an Application to Human Hand Tracking , 1994, ECCV.

[2]  Vijay V. Raghavan,et al.  BitCube: Clustering and Statistical Analysis for XML Documents , 2001 .

[3]  Antoine Doucet,et al.  Naïve Clustering of a large XML Document Collection , 2002, INEX Workshop.

[4]  Andrew T Duchowski,et al.  A breadth-first survey of eye-tracking applications , 2002, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[5]  Timos K. Sellis,et al.  Clustering XML Documents by Structure , 2004, SETN.

[6]  Alexandre Termier,et al.  TreeFinder: a first step towards XML data mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[7]  Qiming Chen,et al.  PrefixSpan,: mining sequential patterns efficiently by prefix-projected pattern growth , 2001, Proceedings 17th International Conference on Data Engineering.

[8]  Ke Wang,et al.  Clustering transactions using large items , 1999, CIKM '99.

[9]  Jinyuan You,et al.  CLOPE: a fast and effective clustering algorithm for transactional data , 2002, KDD.

[10]  Ying Wu,et al.  Hand modeling, analysis and recognition , 2001, IEEE Signal Process. Mag..

[11]  David J. Fleet,et al.  Performance of optical flow techniques , 1994, International Journal of Computer Vision.

[12]  Takeo Kanade,et al.  Model-based tracking of self-occluding articulated objects , 1995, Proceedings of IEEE International Conference on Computer Vision.

[13]  Mong-Li Lee,et al.  XClust: clustering XML schemas for effective integration , 2002, CIKM '02.

[14]  Hiroki Arimura,et al.  Efficient Substructure Discovery from Large Semi-Structured Data , 2001, IEICE Trans. Inf. Syst..

[15]  Vladimir Pavlovic,et al.  Visual Interpretation of Hand Gestures for Human-Computer Interaction: A Review , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  Bing Wang,et al.  Clustering Schemaless XML Documents , 2003, CoopIS/DOA/ODBASE.

[17]  Azriel Rosenfeld,et al.  Face recognition: A literature survey , 2003, CSUR.