Fuzzy Multiset Model and Methods of Nonlinear Document Clustering for Information Retrieval

As a model of information retrieval on the WWW, a fuzzy multiset model is overviewed and a family of fuzzy document clustering algorithms is developed. The fuzzy multiset model is enhanced in order to adapt clustering applications. The standard proximity measure of the cosine coefficient is generalized in the multiset model, and two basic objective functions of fuzzy c-means are considered. Moreover two methods of handling nonlinear classification is proposed: introduction of a cluster volume variable and a kernel trick used in support vector machines. A crisp c-means algorithm and clustering by competitive learning are also studied. A numerical example based on real documents is shown.