Exploiting Web Log Mining for Web Cache Enhancement

Improving the performance of the Web is a crucial requirement, since its popularity resulted in a large increase in the user perceived latency. In this paper, we describe a Web caching scheme that capitalizes on prefetching. Prefetching refers to the mechanism of deducing forthcoming page accesses of a client, based on access log information. Web log mining methods are exploited to provide effective prediction of Web-user accesses. The proposed scheme achieves a coordination between the two techniques (i.e., caching and prefetching). The prefetched documents are accommodated in a dedicated part of the cache, to avoid the drawback of incorrect replacement of requested documents. The requirements of the Web are taken into account, compared to the existing schemes for buffer management in database and operating systems. Experimental results indicate the superiority of the proposed method compared to the previous ones, in terms of improvement in cache performance.

[1]  Jim Zelenka,et al.  Informed prefetching and caching , 1995, SOSP.

[2]  Peter Pirolli,et al.  Mining Longest Repeating Subsequences to Predict World Wide Web Surfing , 1999, USENIX Symposium on Internet Technologies and Systems.

[3]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[4]  Azer Bestavros,et al.  Speculative data dissemination and service to reduce server load, network traffic and service time in distributed information systems , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[5]  Edward A. Fox,et al.  Removal policies in network caches for World-Wide Web documents , 1996, SIGCOMM '96.

[6]  PatternsMing-Syan Chen Eecient Data Mining for Path Traversal Patterns , 1998 .

[7]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[8]  Virgílio A. F. Almeida,et al.  Characterizing reference locality in the WWW , 1996, Fourth International Conference on Parallel and Distributed Information Systems.

[9]  Lars Schmidt-Thieme,et al.  Mining Web Navigation Path Fragments , 2002 .

[10]  Reinhard Klemm WebCompanion: A Friendly Client-Side Web Prefetching Agent , 1999, IEEE Trans. Knowl. Data Eng..

[11]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[12]  Mark Levene,et al.  Data Mining of User Navigation Patterns , 1999, WEBKDD.

[13]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[14]  Themistoklis Palpanas,et al.  Web prefetching using partial match prediction , 1998 .

[15]  Beng Chin Ooi,et al.  Making Web Servers Pushier , 1999, WEBKDD.

[16]  Ramana Rao,et al.  Silk from a sow's ear: extracting usable structures from the Web , 1996, CHI.

[17]  Yannis Manolopoulos,et al.  Finding Generalized Path Patterns for Web Log Data Mining , 2000, ADBIS-DASFAA.

[18]  Myra Spiliopoulou,et al.  Analysis of navigation behaviour in web sites integrating multiple information systems , 2000, The VLDB Journal.

[19]  Sandy Irani,et al.  Cost-Aware WWW Proxy Caching Algorithms , 1997, USENIX Symposium on Internet Technologies and Systems.

[20]  Jin Zhang,et al.  Active Cache: caching dynamic contents on the Web , 1999, Distributed Syst. Eng..

[21]  Philip S. Yu,et al.  Caching on the World Wide Web , 1999, IEEE Trans. Knowl. Data Eng..

[22]  Yannis Manolopoulos,et al.  A Data Mining Algorithm for Generalized Web Prefetching , 2003, IEEE Trans. Knowl. Data Eng..

[23]  Jeffrey C. Mogul,et al.  Using predictive prefetching to improve World Wide Web latency , 1996, CCRV.

[24]  Philip S. Yu,et al.  Efficient Data Mining for Path Traversal Patterns , 1998, IEEE Trans. Knowl. Data Eng..

[25]  Saied Hosseini-Khayat,et al.  On Optimal Replacement of Nonuniform Cache Objects , 2000, IEEE Trans. Computers.

[26]  Darrell D. E. Long,et al.  Exploring the Bounds of Web Latency Reduction from Caching and Prefetching , 1997, USENIX Symposium on Internet Technologies and Systems.

[27]  Jian Pei,et al.  Mining Access Patterns Efficiently from Web Logs , 2000, PAKDD.

[28]  Paul Barford,et al.  Generating representative Web workloads for network and server performance evaluation , 1998, SIGMETRICS '98/PERFORMANCE '98.

[29]  Martin F. Arlitt,et al.  Evaluating content management techniques for Web proxy caches , 2000, PERV.

[30]  Dan Duchamp,et al.  Prefetching Hyperlinks , 1999, USENIX Symposium on Internet Technologies and Systems.

[31]  Dennis Shasha,et al.  2Q: A Low Overhead High Performance Buffer Management Replacement Algorithm , 1994, VLDB.

[32]  Edith Cohen,et al.  Improving end-to-end performance of the Web using server volumes and proxy filters , 1998, SIGCOMM '98.

[33]  Wei Lin,et al.  Web prefetching between low-bandwidth clients and proxies: potential and performance , 1999, SIGMETRICS '99.

[34]  P. Krishnan,et al.  Practical prefetching via data compression , 1993 .

[35]  Yannis Manolopoulos,et al.  Mining patterns from graph traversals , 2001, Data Knowl. Eng..

[36]  Ramesh R. Sarukkai,et al.  Link prediction and path analysis using Markov chains , 2000, Comput. Networks.

[37]  Jiawei Han,et al.  Mining Access Patterns Eeciently from Web Logs ? , 2000 .

[38]  K. Chinen,et al.  An Interactive Prefetching Proxy Server for Improvement of WWW Latency , 1997 .

[39]  Sam H. Noh,et al.  A database disk buffer management algorithm based on prefetching , 1998, CIKM '98.

[40]  Monika Henzinger,et al.  Finding Related Pages in the World Wide Web , 1999, Comput. Networks.