Clustering XML Documents Using Self-organizing Maps for Structures

Self-Organizing Maps capable of encoding structured information will be used for the clustering of XML documents. Documents formatted in XML are appropriately represented as graph data structures. It will be shown that the Self-Organizing Maps can be trained in an unsupervised fashion to group XML structured data into clusters, and that this task is scaled in linear time with increasing size of the corpus. It will also be shown that some simple prior knowledge of the data structures is beneficial to the efficient grouping of the XML documents.

[1]  Teuvo Kohonen,et al.  Self-Organization and Associative Memory , 1988 .

[2]  Teuvo Kohonen,et al.  Self-organization and associative memory: 3rd edition , 1989 .

[3]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[4]  Yeuvo Jphonen,et al.  Self-Organizing Maps , 1995 .

[5]  Ah Chung Tsoi,et al.  A self-organizing map for adaptive processing of structured data , 2003, IEEE Trans. Neural Networks.

[6]  Ah Chung Tsoi,et al.  A supervised self-organizing map for structures , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[7]  Ah Chung Tsoi,et al.  A supervised training algorithm for self-organizing maps for structures , 2005, Pattern Recognit. Lett..

[8]  Ah Chung Tsoi,et al.  Contextual Processing of Graphs using Self-Organizing Maps , 2005, ESANN.