Fast Fuzzy Clustering of Web Page Collections

We study an extension of learning vector quantization that draws on ideas from fuzzy clustering, enabling us to find fuzzy clusters of ellipsoidal shape with a competitive learning scheme. This approach may be seen as a kind of online fuzzy clustering, which can have advantages w.r.t. the execution time of the clustering algorithm. We demonstrate the usefulness of our approach by applying it to web page collections, which are, in general, difficult to cluster due to the high number of dimensions and the special distribution characteristics of the data.

[1]  Jeff A. Bilmes,et al.  A gentle tutorial of the em algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models , 1998 .

[2]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[3]  Warren R. Greiff,et al.  A theory of term weighting based on exploratory data analysis , 1998, SIGIR '98.

[4]  J. C. Peters,et al.  Fuzzy Cluster Analysis : A New Method to Predict Future Cardiac Events in Patients With Positive Stress Tests , 1998 .

[5]  Rudolf Kruse,et al.  Interactive text retrieval based on document similarities , 2000 .

[6]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[7]  James M. Keller,et al.  Fuzzy Models and Algorithms for Pattern Recognition and Image Processing , 1999 .

[8]  Frank Klawonn,et al.  Constructing a fuzzy controller from data , 1997, Fuzzy Sets Syst..

[9]  James Allan,et al.  Automatic structuring and retrieval of large text files , 1994, CACM.

[10]  Donald Gustafson,et al.  Fuzzy clustering with a fuzzy covariance matrix , 1978, 1978 IEEE Conference on Decision and Control including the 17th Symposium on Adaptive Processes.

[11]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[12]  B. Everitt,et al.  Finite Mixture Distributions , 1981 .

[13]  Mark P. Sinka,et al.  A Large Benchmark Dataset for Web Document Clustering , 2002 .

[14]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[15]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[16]  Lynn A. Streeter,et al.  Comparing and combining the effectiveness of latent semantic indexing and the ordinary vector space model for information retrieval , 1989, Inf. Process. Manag..

[17]  Isak Gath,et al.  Unsupervised Optimal Fuzzy Clustering , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[18]  James M. Keller,et al.  A possibilistic approach to clustering , 1993, IEEE Trans. Fuzzy Syst..

[19]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[20]  Kenneth G. Manton,et al.  Fuzzy Cluster Analysis , 2005 .

[21]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.