A Hierarchical Fuzzy Clustering-based System to Create User Profiles

The rapid development of the World Wide Web as a medium of commerce and information dissemination has generated a growing interest of web portal managers in systems able to identify user profiles from the web access logs. The interpretation of these profiles can help re-organize the web portal, e.g., by restructuring the site’s content more efficiently, or even to build adaptive web portals, i.e., portals whose organization and presentation change depending on the specific visitor’s needs. In this paper, we assume that the pages of the web portal have been prearranged in a number of different categories. We introduce a systematic approach to determine a hierarchy of user profiles from the history of users’ accesses to the categories. First, we filter the access log by removing both occasional users and categories of poor interest. Then, we apply an Unsupervised Fuzzy Divisive Hierarchical Clustering (UFDHC) algorithm to cluster the users of the web portal into a hierarchy of fuzzy groups characterized by a set of common interests and each represented by a prototype, which defines the profile of the group typical member. To identify the profile a specific user belongs to, we propose a novel classification method which completely exploits the information contained in the hierarchy. To prove the effectiveness of our approach, we apply the UFDHC algorithm to access log data collected over a period of 15 days and use the classification method to associate a profile with the users defined by access log data collected during subsequent 60 days. Finally, we highlight the good characteristics of our system by comparing our results with the ones obtained by applying a profiling system based on a modified version of the fuzzy C-means.

[1]  Tao Luo,et al.  Effective personalization based on association rule discovery from web usage data , 2001, WIDM '01.

[2]  Oren Etzioni,et al.  Adaptive Web sites , 2000, CACM.

[3]  John F. Kolen,et al.  Reducing the time complexity of the fuzzy c-means algorithm , 2002, IEEE Trans. Fuzzy Syst..

[4]  Olivia R. Liu Sheng,et al.  LinkSelector: A Web mining approach to hyperlink selection for Web portals , 2004, TOIT.

[5]  Florent Masseglia,et al.  An efficient algorithm for Web usage mining , 1999 .

[6]  Myra Spiliopoulou,et al.  Analysis of navigation behaviour in web sites integrating multiple information systems , 2000, The VLDB Journal.

[7]  Oren Etzioni,et al.  Towards adaptive Web sites: Conceptual framework and case study , 2000, Artif. Intell..

[8]  Oren Etzioni,et al.  Adaptive Web Sites: Automatically Synthesizing Web Pages , 1998, AAAI/IAAI.

[9]  A. Joshi,et al.  Web mining: research and practice , 2004, Computing in Science & Engineering.

[10]  David J. Hand,et al.  Advances in Intelligent Data Analysis , 2000, Lecture Notes in Computer Science.

[11]  Anupam Joshi,et al.  Low-complexity fuzzy relational clustering algorithms for Web mining , 2001, IEEE Trans. Fuzzy Syst..

[12]  Reginald E. Hammah,et al.  Validity Measures for the Fuzzy Cluster Analysis of Orientations , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Antonio Picariello,et al.  Web personalization based on static information and dynamic user behavior , 2004, WIDM '04.

[14]  Magne Setnes,et al.  GA-fuzzy modeling and classification: complexity and performance , 2000, IEEE Trans. Fuzzy Syst..

[15]  S. Sitharama Iyengar,et al.  Adaptive neural network clustering of Web users , 2004, Computer.

[16]  Dumitru Dumitrescu,et al.  A Fuzzy Hierarchical Classification System for Olfactory Signals , 2000, Pattern Analysis & Applications.

[17]  James M. Keller,et al.  Fuzzy Models and Algorithms for Pattern Recognition and Image Processing , 1999 .

[18]  Vir V. Phoha,et al.  Web user clustering from access log using belief function , 2001, K-CAP '01.

[19]  Frank Klawonn,et al.  Fuzzy Clustering Based on Modified Distance Measures , 1999, IDA.

[20]  Kate Smith-Miles,et al.  Web page clustering using a self-organizing map of user navigation patterns , 2003, Decis. Support Syst..

[21]  Maurice D. Mulvenna,et al.  Personalization on the Net using Web mining: introduction , 2000, CACM.

[22]  John A. Hartigan,et al.  Clustering Algorithms , 1975 .

[23]  Myra Spiliopoulou,et al.  WUM: A tool for Web Utilization analysis , 1999 .

[24]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[25]  D. Dimitrescu,et al.  Hierarchical pattern classification , 1988 .

[26]  Beatrice Lazzerini,et al.  A system based on a modified version of the FCM algorithm for profiling web users from access log , 2003, EUSFLAT Conf..

[27]  Jaideep Srivastava,et al.  Automatic personalization based on Web usage mining , 2000, CACM.

[28]  Bamshad Mobasher,et al.  Web Usage Mining and Personalization , 2004, The Practical Handbook of Internet Computing.

[29]  Jaideep Srivastava,et al.  Web usage mining: discovery and applications of usage patterns from Web data , 2000, SKDD.

[30]  Myra Spiliopoulou,et al.  WUM - A Tool for WWW Ulitization Analysis , 1998, WebDB.

[31]  Sergio A. Alvarez,et al.  Efficient Adaptive-Support Association Rule Mining for Recommender Systems , 2004, Data Mining and Knowledge Discovery.

[32]  Peter Pirolli,et al.  Mining Longest Repeating Subsequences to Predict World Wide Web Surfing , 1999, USENIX Symposium on Internet Technologies and Systems.

[33]  Umeshwar Dayal,et al.  From User Access Patterns to Dynamic Hypertext Linking , 1996, Comput. Networks.

[34]  Anupam Joshi,et al.  Relational clustering based on a new robust estimator with application to Web mining , 1999, 18th International Conference of the North American Fuzzy Information Processing Society - NAFIPS (Cat. No.99TH8397).

[35]  Kevin Chen-Chuan Chang,et al.  Editorial: special issue on web content mining , 2004, SKDD.

[36]  Cyrus Shahabi,et al.  Knowledge discovery from users Web-page navigation , 1997, Proceedings Seventh International Workshop on Research Issues in Data Engineering. High Performance Database Management for Large-Scale Applications.

[37]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[38]  Gerardo Beni,et al.  A Validity Measure for Fuzzy Clustering , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[39]  Magne Setnes,et al.  Fuzzy relational classifier trained by fuzzy clustering , 1999, IEEE Trans. Syst. Man Cybern. Part B.

[40]  MAGDALINI EIRINAKI,et al.  Web mining for web personalization , 2003, TOIT.