Clustering of text documents by projective dimension of subspaces using part neural network

The paper deals with clustering of text documents by neural networks. For representation of text documents is used the Vector Space (VS) model, which describes the text documents by VS matrix X. Multidimensional space of matrix X for text documents clustering requires the high computational complexity therefore it is needed of its reduction. In our approach for reduction of the text document space we used decomposition of multidimensional space of matrix X by projection into subspaces. The presented approach for creation of subspaces of multidimensional spaces uses the Projective Adaptive Resonance Theory (PART) neural network which enables this way of reduction of multidimensional text document space and also the text document clustering. Efficiency of clustering the text documents by subspaces of multidimensional space it is influenced by properties of PART and because of the optimal parameters of PART have to be set. Thanks to exact settings of distance and vigilance parameter of PART it is possible to find the clusters, their centers in the projective dimensions of subspaces and create outlier cluster for noisy data sets. The utilization of PART neural network to the text document clustering can easy discover the intrinsic clusters in used sets of documents.

[1]  LiuHuan,et al.  Subspace clustering for high dimensional data , 2004 .

[2]  Jianhong Wu,et al.  Projective ART for clustering data sets in high dimensional spaces , 2002, Neural Networks.

[3]  Gerard Salton,et al.  Automatic Text Decomposition and Structuring , 1994, Inf. Process. Manag..

[4]  R. Krakovsky,et al.  Neural network approach to multidimensional data classification via clustering , 2011, 2011 IEEE 9th International Symposium on Intelligent Systems and Informatics.

[5]  Huan Liu,et al.  Subspace clustering for high dimensional data: a review , 2004, SKDD.

[6]  Jianhong Wu,et al.  Dynamics of projective adaptive resonance theory model: the foundation of PART algorithm , 2004, IEEE Transactions on Neural Networks.

[7]  Philip S. Yu,et al.  Fast algorithms for projected clustering , 1999, SIGMOD '99.

[8]  R. Shanmugam Multivariate Analysis: Part 2: Classification, Covariance Structures and Repeated Measurements , 1998 .

[9]  Lihong Huang,et al.  Projective ART with buffers for the high dimensional space clustering and an application to discover stock associations , 2009, Neurocomputing.

[10]  Stephen Grossberg,et al.  ART 2-A: An adaptive resonance algorithm for rapid category learning and recognition , 1991, Neural Networks.

[11]  Jianhong Wu,et al.  Clustering neural spike trains with transient responses , 2008, 2008 47th IEEE Conference on Decision and Control.

[12]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[13]  Dimitrios Gunopulos,et al.  Subspace Clustering of High Dimensional Data , 2004, SDM.

[14]  Ravikumar Kondadadi,et al.  A similarity-based soft clustering algorithm for documents , 2001, Proceedings Seventh International Conference on Database Systems for Advanced Applications. DASFAA 2001.

[15]  Rung Ching Chen,et al.  Automating construction of a domain ontology using a projective adaptive resonance theory neural network and Bayesian network , 2008, Expert Syst. J. Knowl. Eng..