Low-complexity fuzzy relational clustering algorithms for Web mining

This paper presents new algorithms-fuzzy c-medoids (FCMdd) and robust fuzzy c-medoids (RFCMdd)-for fuzzy clustering of relational data. The objective functions are based on selecting c representative objects (medoids) from the data set in such a way that the total fuzzy dissimilarity within each cluster is minimized. A comparison of FCMdd with the well-known relational fuzzy c-means algorithm (RFCM) shows that FCMdd is more efficient. We present several applications of these algorithms to Web mining, including Web document clustering, snippet clustering, and Web access log analysis.

[1]  Michael J. Pazzani,et al.  Learning from hotlists and coldlists: towards a WWW information filtering and seeking agent , 1995, Proceedings of 7th IEEE International Conference on Tools with Artificial Intelligence.

[2]  Arun N. Swami,et al.  Clustering Data Without Distance Functions , 1998, IEEE Data Eng. Bull..

[3]  Loren Terveen,et al.  PHOAKS: a system for sharing recommendations , 1997, CACM.

[4]  Oren Etzioni,et al.  Towards adaptive Web sites: Conceptual framework and case study , 2000, Artif. Intell..

[5]  Oren Etzioni,et al.  Adaptive Web Sites: Automatically Synthesizing Web Pages , 1998, AAAI/IAAI.

[6]  O. Firschein,et al.  Syntactic pattern recognition and applications , 1983, Proceedings of the IEEE.

[7]  Jiawei Han,et al.  Efficient and Effective Clustering Methods for Spatial Data Mining , 1994, VLDB.

[8]  Pattie Maes,et al.  Social information filtering: algorithms for automating “word of mouth” , 1995, CHI '95.

[9]  Olfa Nasraoui,et al.  Clustering using a genetic fuzzy least median of squares algorithm , 1997, 1997 Annual Meeting of the North American Fuzzy Information Processing Society - NAFIPS (Cat. No.97TH8297).

[10]  Philip S. Yu,et al.  Efficient Data Mining for Path Traversal Patterns , 1998, IEEE Trans. Knowl. Data Eng..

[11]  Jaideep Srivastava,et al.  Web mining: information and pattern discovery on the World Wide Web , 1997, Proceedings Ninth IEEE International Conference on Tools with Artificial Intelligence.

[12]  Jongwoo Kim,et al.  Application of the least trimmed squares technique to prototype-based clustering , 1996, Pattern Recognit. Lett..

[13]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[14]  Thorsten Joachims,et al.  Web Watcher: A Tour Guide for the World Wide Web , 1997, IJCAI.

[15]  Jiawei Han,et al.  CLARANS: A Method for Clustering Objects for Spatial Data Mining , 2002, IEEE Trans. Knowl. Data Eng..

[16]  Rajesh N. Davé,et al.  Robust clustering methods: a unified view , 1997, IEEE Trans. Fuzzy Syst..

[17]  Xiaomin Liu,et al.  A Least Biased Fuzzy Clustering Method , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[18]  Anupam Joshi,et al.  Mining web access logs using a fuzzy relational clustering algorithm based on a robust estimator , 1999, WWW 1999.

[19]  Thomas A. Runkler,et al.  Alternating cluster estimation: a new tool for clustering and function approximation , 1999, IEEE Trans. Fuzzy Syst..

[20]  Mohamed A. Ismail,et al.  Fuzzy clustering for symbolic data , 1998, IEEE Trans. Fuzzy Syst..

[21]  Peter J. Rousseeuw,et al.  Clustering by means of medoids , 1987 .

[22]  K. Chidananda Gowda,et al.  Symbolic clustering using a new similarity measure , 1992, IEEE Trans. Syst. Man Cybern..

[23]  Alberto O. Mendelzon,et al.  WebOQL: restructuring documents, databases and Webs , 1998, Proceedings 14th International Conference on Data Engineering.

[24]  Cyrus Shahabi,et al.  Knowledge discovery from users Web-page navigation , 1997, Proceedings Seventh International Workshop on Research Issues in Data Engineering. High Performance Database Management for Large-Scale Applications.

[25]  Alberto O. Mendelzon,et al.  WebOQL: restructuring documents, databases, and webs , 1999 .

[26]  S. Sen,et al.  Clustering of relational data containing noise and outliers , 1998, 1998 IEEE International Conference on Fuzzy Systems Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98CH36228).

[27]  Anupam Joshi,et al.  Personalization & Asynchronicity to Support Mobile Web Access , 1998, Workshop on Web Information and Data Management.

[28]  James C. Bezdek,et al.  Relational duals of the c-means clustering algorithms , 1989, Pattern Recognit..

[29]  Donald H. Kraft,et al.  An Integrated Approach to Information Retrieval with Fuzzy Clustering and Fuzzy Inferencing , 2000 .

[30]  Thorsten Joachims,et al.  WebWatcher : A Learning Apprentice for the World Wide Web , 1995 .

[31]  T. Joachims WebWatcher : A Tour Guide for the World Wide Web , 1997 .

[32]  Enrique H. Ruspini,et al.  Numerical methods for fuzzy clustering , 1970, Inf. Sci..

[33]  Sudipto Guha,et al.  CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.

[34]  James C. Bezdek,et al.  Nerf c-means: Non-Euclidean relational fuzzy clustering , 1994, Pattern Recognit..

[35]  Sadaaki Miyamoto,et al.  Fuzzy Sets in Information Retrieval and Cluster Analysis , 1990, Theory and Decision Library.

[36]  Olga Pons,et al.  Knowledge Management in Fuzzy Databases , 2000 .

[37]  Anupam Joshi,et al.  Relational clustering based on a new robust estimator with application to Web mining , 1999, 18th International Conference of the North American Fuzzy Information Processing Society - NAFIPS (Cat. No.99TH8397).

[38]  Peter H. A. Sneath,et al.  Numerical Taxonomy: The Principles and Practice of Numerical Classification , 1973 .

[39]  James M. Keller,et al.  A possibilistic approach to clustering , 1993, IEEE Trans. Fuzzy Syst..

[40]  M. Roubens Pattern classification problems and fuzzy sets , 1978 .

[41]  Alfred Kobsa,et al.  Personalized Hypermedia Information Provision Through Adaptive and Adaptable System Features: User Modelling, Privacy and Security Issues , 1997, IS&N.

[42]  Wendy R. Fox,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1991 .

[43]  Oren Etzioni,et al.  Web document clustering: a feasibility demonstration , 1998, SIGIR '98.

[44]  Narendra Ahuja,et al.  Location- and Density-Based Hierarchical Clustering Using Similarity Analysis , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[45]  M. P. Windham Numerical classification of proximity data with assignment measures , 1985 .

[46]  Jiawei Han,et al.  Discovering Web access patterns and trends by applying OLAP and data mining technology on Web logs , 1998, Proceedings IEEE International Forum on Research and Technology Advances in Digital Libraries -ADL'98-.

[47]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[48]  James M. Keller,et al.  The possibilistic C-means algorithm: insights and recommendations , 1996, IEEE Trans. Fuzzy Syst..

[49]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[50]  PatternsMing-Syan Chen Eecient Data Mining for Path Traversal Patterns , 1998 .

[51]  Elias N. Houstis,et al.  On disconnected browsing of distributed information , 1997, Proceedings Seventh International Workshop on Research Issues in Data Engineering. High Performance Database Management for Large-Scale Applications.

[52]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[53]  Peter J. Rousseeuw,et al.  Robust regression and outlier detection , 1987 .

[54]  Jiawei Han,et al.  WebML: Querying the World-Wide Web for Resources and Knowledge , 1998, Workshop on Web Information and Data Management.

[55]  R.J. Hathaway,et al.  Switching regression models and fuzzy clustering , 1993, IEEE Trans. Fuzzy Syst..

[56]  B. Ripley,et al.  Pattern Recognition , 1968, Nature.