Popularity-based PPM: an effective Web prefetching technique for high accuracy and low storage

Prediction by partial match (PPM) is a commonly used technique in Web prefetching, where prefetching decisions are made based on historical URLs in a dynamically maintained Markov prediction tree. Existing approaches either widely store the URL nodes by building the tree with a fixed height in each branch, or only store the branches with frequently accessed URLs. Building the popularity information into the Markov prediction tree, we propose a new prefetching model, called popularity-based PPM. In this model, the tree is dynamically updated with a variable height in each set of branches where a popular URL can lead a set of long branches, and a less popular document leads a set of short ones. Since majority root nodes are popular URLs in our approach, the space allocation for storing nodes are effectively utilized. We have also included two additional optimizations in this model: (1) directly linking a root node to duplicated popular nodes in a surfing path to give popular URLs more considerations for prefetching; and (2) making a space optimization after the tree is built to further remove less popular nodes. Our trace-driven simulation results comparatively show a significant space reduction and an improved prediction accuracy of the proposed prefetching technique.

[1]  Dan Duchamp,et al.  Prefetching Hyperlinks , 1999, USENIX Symposium on Internet Technologies and Systems.

[2]  Evangelos P. Markatos,et al.  A top- 10 approach to prefetching on the web , 1996 .

[3]  Leonard Kleinrock,et al.  An adaptive network prefetch scheme , 1998, IEEE J. Sel. Areas Commun..

[4]  Azer Bestavros,et al.  Using speculation to reduce server load and service time on the WWW , 1995, CIKM '95.

[5]  Huberman,et al.  Strong regularities in world wide web surfing , 1998, Science.

[6]  Darrell D. E. Long,et al.  Exploring the Bounds of Web Latency Reduction from Caching and Prefetching , 1997, USENIX Symposium on Internet Technologies and Systems.

[7]  Ramesh R. Sarukkai,et al.  Link prediction and path analysis using Markov chains , 2000, Comput. Networks.

[8]  Ian H. Witten,et al.  Data Compression Using Adaptive Coding and Partial String Matching , 1984, IEEE Trans. Commun..

[9]  Javed I. Khan,et al.  Partial Prefetch for Faster Surfing in Composite Hypermedia , 2001, USITS.

[10]  Edith Cohen,et al.  Improving end-to-end performance of the Web using server volumes and proxy filters , 1998, SIGCOMM '98.

[11]  Mark S. Squillante,et al.  A general methodology for characterizing access patterns and analyzing Web server performance , 1998, Proceedings. Sixth International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (Cat. No.98TB100247).

[12]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[13]  Azer Bestavros,et al.  Popularity-aware greedy dual-size Web proxy caching algorithms , 2000, Proceedings 20th IEEE International Conference on Distributed Computing Systems.

[14]  Themistoklis Palpanas,et al.  Web prefetching using partial match prediction , 1998 .

[15]  Michael D. Smith,et al.  Using Path Profiles to Predict HTTP Requests , 1998, Comput. Networks.

[16]  Wei Lin,et al.  Web prefetching between low-bandwidth clients and proxies: potential and performance , 1999, SIGMETRICS '99.

[17]  Peter Pirolli,et al.  Mining Longest Repeating Subsequences to Predict World Wide Web Surfing , 1999, USENIX Symposium on Internet Technologies and Systems.

[18]  Paul Barford,et al.  The network effects of prefetching , 1998, Proceedings. IEEE INFOCOM '98, the Conference on Computer Communications. Seventeenth Annual Joint Conference of the IEEE Computer and Communications Societies. Gateway to the 21st Century (Cat. No.98.

[19]  Anja Feldmann,et al.  Web proxy caching: the devil is in the details , 1998, PERV.

[20]  Azer Bestavros,et al.  Changes in Web client access patterns: Characteristics and caching implications , 1999, World Wide Web.

[21]  Jeffrey C. Mogul,et al.  Using predictive prefetching to improve World Wide Web latency , 1996, CCRV.

[22]  Lili Qiu,et al.  The content and access dynamics of a busy Web site: findings and implications , 2000 .

[23]  Xin Chen,et al.  Coordinated data prefetching by utilizing reference information at both proxy and web servers , 2001, PERV.