A time-based self-organising model for document clustering

Most current approaches for document clustering do not consider the non-stationary feature of real world document collection. In this paper, in a non-stationary environment, we propose a new self-organising model, namely the dynamic adaptive self-organising hybrid (DASH) model. The DASH model runs continuously since the new document set is formed consecutively for training while the old document set is still at the training stage. Knowledge learned from the old data set is adjusted to reflect the new data set and therefore document clusters are up-to-date. We test the performance of our model using the Reuters-RCV1 news corpus and obtain promising results based on the criteria of classification accuracy and average quantization error.

[1]  Stefan Wermter,et al.  A dynamic adaptive self-organising hybrid model for text clustering , 2003, Third IEEE International Conference on Data Mining.

[2]  Bernd Fritzke,et al.  A Growing Neural Gas Network Learns Topologies , 1994, NIPS.

[3]  Bernd Fritzke,et al.  A Self-Organizing Network that Can Follow Non-stationary Distributions , 1997, ICANN.

[4]  Teuvo Kohonen,et al.  Self-Organization and Associative Memory , 1988 .

[5]  SaltonGerard,et al.  Term-weighting approaches in automatic text retrieval , 1988 .

[6]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[7]  Andreas Rauber,et al.  The growing hierarchical self-organizing map: exploratory analysis of high-dimensional data , 2002, IEEE Trans. Neural Networks.

[8]  Samuel Kaski,et al.  Self organization of a massive document collection , 2000, IEEE Trans. Neural Networks Learn. Syst..

[9]  Stephen R. Marsland,et al.  A self-organising network that grows when required , 2002, Neural Networks.

[10]  Soumen Chakrabarti,et al.  Data mining for hypertext: a tutorial survey , 2000, SKDD.

[11]  Risto Miikkulainen,et al.  Incremental grid growing: encoding high-dimensional structure into a two-dimensional feature map , 1993, IEEE International Conference on Neural Networks.

[12]  Sebastian Thrun,et al.  Clustering Learning Tasks and the Selective Cross-Task Transfer of Knowledge , 1998, Learning to Learn.

[13]  T. Martínez,et al.  Competitive Hebbian Learning Rule Forms Perfectly Topology Preserving Maps , 1993 .

[14]  John MacIntyre,et al.  Knowledge Transfer between Neural Networks , 2002 .

[15]  Bernd Fritzke,et al.  Growing cell structures--A self-organizing network for unsupervised and supervised learning , 1994, Neural Networks.

[16]  Kevin Warwick,et al.  The plastic self organising map , 2002, Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02 (Cat. No.02CH37290).