A clustering-based prefetching scheme on a Web cache environment

Web prefetching is an attractive solution to reduce the network resources consumed by Web services as well as the access latencies perceived by Web users. Unlike Web caching, which exploits the temporal locality, Web prefetching utilizes the spatial locality of Web objects. Specifically, Web prefetching fetches objects that are likely to be accessed in the near future and stores them in advance. In this context, a sophisticated combination of these two techniques may cause significant improvements on the performance of the Web infrastructure. Considering that there have been several caching policies proposed in the past, the challenge is to extend them by using data mining techniques. In this paper, we present a clustering-based prefetching scheme where a graph-based clustering algorithm identifies clusters of ''correlated'' Web pages based on the users' access patterns. This scheme can be integrated easily into a Web proxy server, improving its performance. Through a simulation environment, using a real data set, we show that the proposed integrated framework is robust and effective in improving the performance of the Web caching environment.

[1]  Arun Venkataramani,et al.  The potential costs and benefits of long-term prefetching for content distribution , 2002, Comput. Commun..

[2]  Nan Jiang,et al.  Research issues in data stream association rule mining , 2006, SGMD.

[3]  Hongjun Lu,et al.  Cut-and-Pick Transactions for Proxy Log Mining , 2002, EDBT.

[4]  Qiang Yang,et al.  Web-Log Mining for Predictive Web Caching , 2003, IEEE Trans. Knowl. Data Eng..

[5]  Athena Vakali,et al.  An Overview of Web Data Clustering Practices , 2004, EDBT Workshops.

[6]  George Pallis,et al.  FRES-CAR: An Adaptive Cache Replacement Policy , 2005, International Workshop on Challenges in Web Information Retrieval and Integration.

[7]  Ming-Syan Chen,et al.  Integrating Web Caching and Web Prefetching in Client-Side Proxies , 2005, IEEE Trans. Parallel Distributed Syst..

[8]  Xin Chen,et al.  Popularity-based PPM: an effective Web prefetching technique for high accuracy and low storage , 2002, Proceedings International Conference on Parallel Processing.

[9]  George Karypis,et al.  Selective Markov models for predicting Web page accesses , 2004, TOIT.

[10]  Darrell D. E. Long,et al.  Exploring the Bounds of Web Latency Reduction from Caching and Prefetching , 1997, USENIX Symposium on Internet Technologies and Systems.

[11]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[12]  Myra Spiliopoulou,et al.  Analysis of navigation behaviour in web sites integrating multiple information systems , 2000, The VLDB Journal.

[13]  Jeffrey C. Mogul,et al.  Using predictive prefetching to improve World Wide Web latency , 1996, CCRV.

[14]  Geoffrey H. Kuenning,et al.  Automated hoarding for mobile computers , 1997, SOSP.

[15]  Michalis Faloutsos,et al.  On power-law relationships of the Internet topology , 1999, SIGCOMM '99.

[16]  Lili Qiu,et al.  The content and access dynamics of a busy Web site: findings and implications , 2000 .

[17]  Darin Fisher,et al.  Link Prefetching in Mozilla: A Server-Driven Approach , 2003, WCW.

[18]  Junyi Shen,et al.  Efficient data mining for web navigation patterns , 2004, Inf. Softw. Technol..

[19]  George Pallis,et al.  Insight and perspectives for content delivery networks , 2006, CACM.

[20]  Vipin Kumar,et al.  Parallel Multilevel Algorithms for Multi-constraint Graph Partitioning (Distinguished Paper) , 2000, Euro-Par.

[21]  Thomas H. Cormen,et al.  Introduction to algorithms [2nd ed.] , 2001 .

[22]  Yannis Manolopoulos,et al.  Caching in Web memory hierarchies , 2004, SAC '04.

[23]  Bo Hong,et al.  Managing flash crowds on the Internet , 2003, 11th IEEE/ACM International Symposium on Modeling, Analysis and Simulation of Computer Telecommunications Systems, 2003. MASCOTS 2003..

[24]  Clifford Stein,et al.  Introduction to Algorithms, 2nd edition. , 2001 .

[25]  AngelisLefteris,et al.  Validation and interpretation of Web users' sessions clusters , 2007 .

[26]  Dongman Lee,et al.  Proactive Web caching with cumulative prefetching for large multimedia data , 2000, Comput. Networks.

[27]  Qiang Yang,et al.  Integrating Web Prefetching and Caching Using Prediction Models , 2002, World Wide Web.

[28]  Themistoklis Palpanas,et al.  Web prefetching using partial match prediction , 1998 .

[29]  Lefteris Angelis,et al.  Validation and interpretation of Web users' sessions clusters , 2007, Inf. Process. Manag..

[30]  Niki Pissinou,et al.  A context-aware prefetching strategy for mobile computing environments , 2006, IWCMC '06.

[31]  Minyou Wu,et al.  Web Prefetching : Costs , Benefits and Performance , 2002 .

[32]  Randy H. Katz,et al.  Efficient and adaptive Web replication using content clustering , 2003, IEEE J. Sel. Areas Commun..

[33]  László Böszörményi,et al.  A survey of Web cache replacement strategies , 2003, CSUR.