Mining Web Logs with PLSA Based Prediction Model to Improve Web Caching Performance

Web caching is a well-known strategy for improving the performance of web systems. The key to better web caching performance is an efficient replacing policy that keeps in the cache popular documents and replaces rarely used ones. When coupled with web log mining, the replacing policy can more accurately decide which documents should be cached. In this paper, we present a PLSA based prediction model to predict the user access patterns and interest to extend the well-known NGRAM-GDSF caching policy. Extensive experiments are conducted on the publicly available web logs datasets. The result shows that our approach gets better web-access performance.

[1]  Jianhui Lin,et al.  Research on WEB Cache Prediction Recommend Mechanism Based on Usage Pattern , 2008, WKDD.

[2]  Ludmila Cherkasova,et al.  Improving WWW Proxies Performance with Greedy-Dual- Size-Frequency Caching Policy , 1998 .

[3]  Hongjun Chen,et al.  A Personalization Recommendation Method Based on Deep Web Data Query , 2012, J. Comput..

[4]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[5]  Xue Chen,et al.  Automatic Discovery of Semantic Relations Based on Association Rule , 2008, J. Softw..

[6]  Jia Wang,et al.  A survey of web caching schemes for the Internet , 1999, CCRV.

[7]  Darrell D. E. Long,et al.  Exploring the Bounds of Web Latency Reduction from Caching and Prefetching , 1997, USENIX Symposium on Internet Technologies and Systems.

[8]  Wei Huang,et al.  Semantic Focused Crawling for Retrieving E-Commerce Information , 2009, J. Softw..

[9]  Sandy Irani,et al.  Cost-Aware WWW Proxy Caching Algorithms , 1997, USENIX Symposium on Internet Technologies and Systems.

[10]  Siti Mariyam Hj. Shamsuddin,et al.  Intelligent Client-Side Web Caching Scheme Based on Least Recently Used Algorithm and Neuro-Fuzzy System , 2009, ISNN.

[11]  Qiang Yang,et al.  Mining web logs for prediction models in WWW caching and prefetching , 2001, KDD '01.

[12]  Peter Pirolli,et al.  Mining Longest Repeating Subsequences to Predict World Wide Web Surfing , 1999, USENIX Symposium on Internet Technologies and Systems.

[13]  Qinghui Liu,et al.  Web latency reduction with prefetching , 2009 .

[14]  Kin Yeung Wong,et al.  Web cache replacement policies: a pragmatic approach , 2006, IEEE Network.

[15]  Jianfeng Zhu,et al.  Web Clustering Based On Tag Set Similarity , 2011, J. Comput..

[16]  Qiang Yang,et al.  WhatNext: a prediction system for Web requests using n-gram sequence models , 2000, Proceedings of the First International Conference on Web Information Systems Engineering.

[17]  Maozu Guo,et al.  Web Page Classification Using Relational Learning Algorithm and Unlabeled Data , 2011, J. Comput..