P-FCM: a proximity -- based fuzzy clustering

Abstract In this study, we introduce and study a proximity-based fuzzy clustering. As the name stipulates, in this mode of clustering, a structure “discovery” in the data is realized in an unsupervised manner and becomes augmented by a certain auxiliary supervision mechanism. The supervision mechanism introduced in this algorithm is realized via a number of proximity “hints” (constraints) that specify an extent to which some pairs of patterns are regarded similar or different. They are provided externally to the clustering algorithm and help in the navigation of the search through the set of patterns and this gives rise to a two-phase optimization process. Its first phase is the standard FCM while the second step is concerned with the gradient-driven minimization of the differences between the provided proximity values and those computed on a basis of the partition matrix computed at the first phase of the algorithm. The proximity type of auxiliary information is discussed in the context of Web mining where clusters of Web pages are built in presence of some proximity information provided by a user who assesses (assigns) these degrees on a basis of some personal preferences. Numeric studies involve experiments with several synthetic data and Web data (pages).

[1]  Johannes Fürnkranz,et al.  Exploiting Structural Information for Text Classification on the WWW , 1999, IDA.

[2]  Thomas A. Runkler,et al.  Web mining with relational clustering , 2003, Int. J. Approx. Reason..

[3]  Anupam Joshi,et al.  Low-complexity fuzzy relational clustering algorithms for Web mining , 2001, IEEE Trans. Fuzzy Syst..

[4]  Fionn Murtagh,et al.  Clustering of XML documents , 2000 .

[5]  Frank Hoeppner,et al.  Fuzzy shell clustering algorithms in image processing: fuzzy C-rectangular and 2-rectangular shells , 1997, IEEE Trans. Fuzzy Syst..

[6]  Geoffrey Zweig,et al.  Syntactic Clustering of the Web , 1997, Comput. Networks.

[7]  Vipin Kumar,et al.  Partitioning-based clustering for Web document categorization , 1999, Decis. Support Syst..

[8]  Witold Pedrycz,et al.  Granular computing: an introduction , 2001, Proceedings Joint 9th IFSA World Congress and 20th NAFIPS International Conference (Cat. No. 01TH8569).

[9]  Reginald L. Walker,et al.  Search engine case study: searching the web using genetic programming and MPI , 2001, Parallel Comput..

[10]  Divyakant Agrawal,et al.  Supporting web query expansion efficiently using multi-granularity indexing and query processing , 2000, Data Knowl. Eng..

[11]  Sankar K. Pal,et al.  Data mining in soft computing framework: a survey , 2002, IEEE Trans. Neural Networks.

[12]  Thomas A. Runkler,et al.  Alternating cluster estimation: a new tool for clustering and function approximation , 1999, IEEE Trans. Fuzzy Syst..

[13]  James C. Bezdek,et al.  Nerf c-means: Non-Euclidean relational fuzzy clustering , 1994, Pattern Recognit..

[14]  Michael Spann,et al.  A new approach to clustering , 1990, Pattern Recognit..

[15]  Kate Smith-Miles,et al.  Web page clustering using a self-organizing map of user navigation patterns , 2003, Decis. Support Syst..

[16]  José Palazzo Moreira de Oliveira,et al.  Concept-based knowledge discovery in texts extracted from the Web , 2000, SKDD.

[17]  Sadaaki Miyamoto,et al.  Information clustering based on fuzzy multisets , 2003, Inf. Process. Manag..

[18]  James C. Bezdek,et al.  Generalized fuzzy c-means clustering strategies using Lp norm distances , 2000, IEEE Trans. Fuzzy Syst..

[19]  J. Leon Zhao,et al.  Automatic discovery of similarity relationships through Web mining , 2003, Decis. Support Syst..

[20]  James C. Bezdek,et al.  On relational data versions of c-means algorithms , 1996, Pattern Recognit. Lett..

[21]  James C. Bezdek,et al.  Relational duals of the c-means clustering algorithms , 1989, Pattern Recognit..

[22]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[23]  Beatrice Lazzerini,et al.  Classification based on neural similarity , 2002 .