A modified fuzzy ART for soft document clustering

Document clustering is a very useful application in recent days especially with the advent of the World Wide Web. Most of the existing document clustering algorithms either produce clusters of poor quality or are highly computationally expensive. In this paper we propose a document-clustering algorithm, KMART, that uses an unsupervised fuzzy adaptive resonance theory (fuzzy-ART) neural network. A modified version of the fuzzy ART is used to enable a document to be in multiple clusters. The number of clusters is determined dynamically. Some experiments are reported to compare the efficiency and execution time of our algorithm with other document-clustering algorithm like fuzzy c-means. The results show that KMART is both effective and efficient.

[1]  Yiyu Yao,et al.  Computation of term associations by a neural network , 1993, SIGIR.

[2]  Stephen Grossberg,et al.  Fuzzy ART: Fast stable learning and categorization of analog patterns by an adaptive resonance system , 1991, Neural Networks.

[3]  Ravikumar Kondadadi,et al.  A similarity-based soft clustering algorithm for documents , 2001, Proceedings Seventh International Conference on Database Systems for Advanced Applications. DASFAA 2001.

[4]  R. Shanmugam Multivariate Analysis: Part 2: Classification, Covariance Structures and Repeated Measurements , 1998 .

[5]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[6]  Peter Willett,et al.  Recent trends in hierarchic document clustering: A critical review , 1988, Inf. Process. Manag..

[7]  Robert M. Gray,et al.  An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..

[8]  Palma Blonda,et al.  A survey of fuzzy clustering algorithms for pattern recognition. I , 1999, IEEE Trans. Syst. Man Cybern. Part B.

[9]  Anil K. Jain,et al.  Knowledge-based clustering scheme for collection management and retrieval of library books , 1995, Pattern Recognit..

[10]  Robert E. Tarjan,et al.  Data structures and network algorithms , 1983, CBMS-NSF regional conference series in applied mathematics.

[11]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[12]  Alberto Muòoz,et al.  Compound Key Word Generation from Document Databases Using A Hierarchical Clustering ART Model , 1997 .

[13]  Philip S. Yu,et al.  Data Mining: An Overview from a Database Perspective , 1996, IEEE Trans. Knowl. Data Eng..

[14]  Geoffrey J. McLachlan,et al.  Multivariate analysis: Classification and discriminant analysis , 2001 .

[15]  Marshall Ramsey,et al.  Interactive Internet search through automatic clustering (poster abstract): an empirical study , 1999, SIGIR '99.

[16]  Ravikumar Kondadadi,et al.  A word-based soft clustering algorithm for documents , 2001, Computers and Their Applications.

[17]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[18]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[19]  Fionn Murtagh,et al.  A Survey of Recent Advances in Hierarchical Clustering Algorithms , 1983, Comput. J..

[20]  David R. Karger,et al.  Scatter/Gather: a cluster-based approach to browsing large document collections , 1992, SIGIR '92.

[21]  Teuvo Kohonen,et al.  The self-organizing map , 1990 .

[22]  Oren Etzioni,et al.  Web document clustering: a feasibility demonstration , 1998, SIGIR '98.

[23]  Peter Willett,et al.  Implementation of nearest-neighbor searching in an online chemical structure search system , 1986, J. Chem. Inf. Comput. Sci..