A Generalization-Based Approach to Clustering of Web Usage Sessions

The clustering of Web usage sessions based on the access patterns is studied. Access patterns of Web users are extracted from Web server log files, and then organized into sessions which represent episodes of interaction between the Web users and the Web server. Using attribute-oriented induction, the sessions are then generalized according to a page hierarchy which organizes pages based on their contents. These generalized sessions are finally clustered using a hierarchical clustering method. Our experiments on a large real data set show that the approach is efficient and practical for Web mining applications.

[1]  Mark Levene,et al.  Mining Association Rules in Hypertext Databases , 1998, KDD.

[2]  Allison Woodruff,et al.  An Investigation of Documents from the World Wide Web , 1996, Comput. Networks.

[3]  Maurice D. Mulvenna,et al.  Discovering Internet marketing intelligence through online analytical web usage mining , 1998, SGMD.

[4]  Oren Etzioni,et al.  Adaptive Web Sites: Automatically Synthesizing Web Pages , 1998, AAAI/IAAI.

[5]  Sankar K. Pal,et al.  Fuzzy models for pattern recognition , 1992 .

[6]  Dimitrios Gunopulos,et al.  Automatic subspace clustering of high dimensional data for data mining applications , 1998, SIGMOD '98.

[7]  Philip S. Yu,et al.  Efficient Data Mining for Path Traversal Patterns in Distributed Systems , 1996, ICDCS.

[8]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[9]  Cyrus Shahabi,et al.  Knowledge discovery from users Web-page navigation , 1997, Proceedings Seventh International Workshop on Research Issues in Data Engineering. High Performance Database Management for Large-Scale Applications.

[10]  Oren Etzioni,et al.  Fast and Intuitive Clustering of Web Documents , 1997, KDD.

[11]  Philip S. Yu,et al.  Data mining for path traversal patterns in a web environment , 1996, Proceedings of 16th International Conference on Distributed Computing Systems.

[12]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[13]  Jiawei Han,et al.  Knowledge Discovery in Databases: An Attribute-Oriented Approach , 1992, VLDB.

[14]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[15]  Jiawei Han,et al.  Dynamic Generation and Refinement of Concept Hierarchies for Knowledge Discovery in Databases , 1994, KDD Workshop.

[16]  Jiawei Han,et al.  Discovering Web access patterns and trends by applying OLAP and data mining technology on Web logs , 1998, Proceedings IEEE International Forum on Research and Technology Advances in Digital Libraries -ADL'98-.

[17]  Ryszard S. Michalski,et al.  Automated Construction of Classifications: Conceptual Clustering Versus Numerical Taxonomy , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Oren Etzioni,et al.  The World-Wide Web: quagmire or gold mine? , 1996, CACM.

[19]  Umeshwar Dayal,et al.  From User Access Patterns to Dynamic Hypertext Linking , 1996, Comput. Networks.

[20]  Jaideep Srivastava,et al.  Web mining: information and pattern discovery on the World Wide Web , 1997, Proceedings Ninth IEEE International Conference on Tools with Artificial Intelligence.