论文信息 - A Distributed Clustering Algorithm for Web-Based Access Patterns

A Distributed Clustering Algorithm for Web-Based Access Patterns

ABSTRACT We introduce a distributed document clustering algorithm based on user access patterns for multi-server Web sites. Our algorithm makes it possible to exploit simultaneously adaptive document replication and persistent connections, two techniques that are most e ective in decreasing the response time that is observed by Web users. The algorithm rst distributes the user access data evenly among the servers by using a hash function. Then, each server generates a local clustering on its fair share of the user sessions records by employing a traditional single-machine document clustering algorithm. Finally, those local clustering results are combined together by using a novel procedure that generates maximal large itemsets of Web documents. We present preliminary experimental results and discuss alternative approaches to be pursued in the future.

Mehmet Sayal | Page Mill Road | Mehmet Sayal

[1] Umeshwar Dayal,et al. An Application of Adaptive Data Mining: Facilitating Web Information Access , 1997, DMKD.

[2] Michelle Butler,et al. A Scalable HTTP Server: The NCSA Prototype , 1994, Comput. Networks ISDN Syst..

[3] Anil K. Jain,et al. Algorithms for Clustering Data , 1988 .

[4] Peter Scheuermann,et al. Web++: A System for Fast and Reliable Web Service , 1999, USENIX Annual Technical Conference, General Track.

[5] Clement T. Yu,et al. Adaptive record clustering , 1985, TODS.

[6] Jaideep Srivastava,et al. Creating adaptive Web sites through usage-based clustering of URLs , 1999, Proceedings 1999 Workshop on Knowledge and Data Engineering Exchange (KDEX'99) (Cat. No.PR00453).

[7] J. MacQueen. Some methods for classification and analysis of multivariate observations , 1967 .

[8] Roy T. Fielding,et al. Hypertext Transfer Protocol - HTTP/1.1 , 1997, RFC.

[9] Venkata Subramaniam,et al. Information Retrieval: Data Structures & Algorithms , 1992 .

[10] Amit Aggarwal,et al. RaDaR: A Scalable Architecture for a Global Web Hosting Service , 1999, Comput. Networks.

[11] John A. Hartigan,et al. Clustering Algorithms , 1975 .

[12] Ramakrishnan Srikant,et al. Fast algorithms for mining association rules , 1998, VLDB 1998.

[13] Jiawei Han,et al. Discovering Web access patterns and trends by applying OLAP and data mining technology on Web logs , 1998, Proceedings IEEE International Forum on Research and Technology Advances in Digital Libraries -ADL'98-.

[14] E. Forgy,et al. Cluster analysis of multivariate data : efficiency versus interpretability of classifications , 1965 .

[15] Sudipto Guha,et al. ROCK: a robust clustering algorithm for categorical attributes , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).