Algorithms of nonlinear document clustering based on fuzzy multiset model

Fuzzy multiset is applicable as a model of information retrieval because it has the mathematical structure that expresses the number and the degree of attribution of an element simultaneously. Therefore, fuzzy multisets can be used also as a suitable model for document clustering. This paper aims at developing clustering algorithms based on a fuzzy multiset model for document clustering. The standard proximity measure of the cosine correlation is generalized in the multiset model, and two nonlinear clustering techniques are applied to the existing clustering methods. One introduces a variable for controlling cluster volume sizes; the other one is a kernel trick used in support vector machines. Moreover, clustering by competitive learning is also studied. When the kernel trick has been used the classification configuration of data in a high‐dimensional feature space is visualized by self‐organizing maps. Two numerical examples, which use an artificial data and real document data, are shown and effects of the proposed methods are discussed. © 2008 Wiley Periodicals, Inc.

[1]  Sadaaki Miyamoto,et al.  Information clustering based on fuzzy multisets , 2003, Inf. Process. Manag..

[2]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[3]  T. Kunii,et al.  Soft Computing and Human-Centered Machines , 2013, Computer Science Workbench.

[4]  Sadaaki Miyamoto,et al.  Fuzzy Multiset Model for Information Retrieval and Clustering Using a Kernel Function , 2003, ISMIS.

[5]  Sadaaki Miyamoto,et al.  Basic Operations of Fuzzy Multisets , 1996 .

[6]  David G. Stork,et al.  Pattern classification, 2nd Edition , 2000 .

[7]  Sadaaki Miyamoto,et al.  Fuzzy Multiset Space and c-Means Clustering Using Kernles with Applications to Information Retrieval , 2003, IFSA.

[9]  Sadaaki Miyamoto,et al.  Fuzzy c-means as a regularization and maximum entropy approach , 1997 .

[10]  R. Yager ON THE THEORY OF BAGS , 1986 .

[11]  Sadaaki Miyamoto,et al.  Fuzzy c-Means Clustering Using Kernel Functions in Support Vector Machines , 2003, J. Adv. Comput. Intell. Intell. Informatics.

[12]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[13]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[14]  M. V. Velzen,et al.  Self-organizing maps , 2007 .

[15]  Mark A. Girolami,et al.  Mercer kernel-based clustering in feature space , 2002, IEEE Trans. Neural Networks.

[16]  宮本 定明 Fuzzy sets in information retrieval and cluster analysis , 1990 .

[17]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[18]  Sadaaki Miyamoto,et al.  Fuzzy Multisets and Their Generalizations , 2000, WMP.