Clustering with Random Indexing K-tree and XML Structure

This paper describes the approach taken to the clustering task at INEX 2009 by a group at the Queensland University of Technology. The Random Indexing (RI) K-tree has been used with a representation that is based on the semantic markup available in the INEX 2009 Wikipedia collection. The RI K-tree is a scalable approach to clustering large document collections. This approach has produced quality clustering when evaluated using two different methodologies.

[1]  Gabriella Kazai Initiative for the Evaluation of XML Retrieval , 2009 .

[2]  Shlomo Geva,et al.  K-tree: large scale document clustering , 2009, SIGIR.

[3]  Heikki Mannila,et al.  Random projection in dimensionality reduction: applications to image and text data , 2001, KDD '01.

[4]  Shlomo Geva,et al.  Document Clustering with K-tree , 2008, INEX.

[5]  Sanjoy Dasgupta,et al.  An elementary proof of a theorem of Johnson and Lindenstrauss , 2003, Random Struct. Algorithms.

[6]  Shlomo Geva K-tree: a height balanced tree structured vector quantizer , 2000, Neural Networks for Signal Processing X. Proceedings of the 2000 IEEE Signal Processing Society Workshop (Cat. No.00TH8501).

[7]  W. B. Johnson,et al.  Extensions of Lipschitz mappings into Hilbert space , 1984 .

[8]  Shlomo Geva,et al.  Random Indexing K-tree , 2009, HiPC 2010.

[9]  Anupam Gupta,et al.  An elementary proof of the Johnson-Lindenstrauss Lemma , 1999 .

[10]  Pentti Kanerva,et al.  The Spatter Code for Encoding Concepts at Many Levels , 1994 .

[11]  K. Sparck Jones,et al.  Simple, proven approaches to text retrieval , 1994 .

[12]  Geoffrey E. Hinton,et al.  Distributed representations and nested compositional structure , 1994 .

[13]  Magnus Sahlgren,et al.  An Introduction to Random Indexing , 2005 .

[14]  Dimitris Achlioptas,et al.  Database-friendly random projections: Johnson-Lindenstrauss with binary coins , 2003, J. Comput. Syst. Sci..

[15]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..