Web Pre-fetching at Proxy Server Using Sequential Data Mining

Reducing latency for accessing web objects is a major challenge in Proxy Server and various techniques such as web caching and Web pre-fetching is used for it. In this paper we have integrated the approach of web caching and pre-fetching using sequential data mining techniques to enhance the proxy server's performance. The web access logs collected at squid proxy servers, can be used derive interesting information regarding user's web navigation pattern which can be utilized in improving the performance of proxy server by integrating the approach of web caching and pre-fetching using sequential data mining techniques. The work presented here uses Pre-order linked position coded Web Access pattern (PLWAP) Algorithm for finding frequent accessed web objects of each user by analyzing the browsing history from the access log files and then compares the results over page replacement algorithms such as, Least Recently Used (LRU) and Least Frequently Used (LFU), first without using concept of pre-fetching and then by using pre-fetching. The experimental results show that for each data set the pre-fetching improves the performance of proxy server.

[1]  Yi Lu,et al.  Mining Web Log Sequential Patterns with Position Coded Pre-Order Linked WAP-Tree , 2005, Data Mining and Knowledge Discovery.

[2]  Jian Pei,et al.  Mining Access Patterns Efficiently from Web Logs , 2000, PAKDD.

[3]  George Pallis,et al.  A clustering-based prefetching scheme on a Web cache environment , 2008, Comput. Electr. Eng..

[4]  Randy H. Katz,et al.  Efficient and adaptive Web replication using content clustering , 2003, IEEE J. Sel. Areas Commun..

[5]  Zhixiang Chen,et al.  Optimal Algorithms for Finding User Access Sessions from Very Large Web Logs , 2004, World Wide Web.

[6]  Ming-Syan Chen,et al.  Integrating Web Caching and Web Prefetching in Client-Side Proxies , 2005, IEEE Trans. Parallel Distributed Syst..

[7]  A. Songwattana,et al.  Mining Web Logs for Prediction in Prefetching and Caching , 2008, 2008 Third International Conference on Convergence and Hybrid Information Technology.

[8]  Hong Liu,et al.  Personalized Services Research Based on Web Data Mining Technology , 2009, 2009 Second International Symposium on Computational Intelligence and Design.

[9]  Xin Chen,et al.  Popularity-based PPM: an effective Web prefetching technique for high accuracy and low storage , 2002, Proceedings International Conference on Parallel Processing.

[10]  Abraham Silberschatz,et al.  Operating System Concepts , 1983 .

[11]  Keqiu Li,et al.  Cache replacement for transcoding proxy caching , 2005, The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05).

[12]  Ishfaq Ahmad,et al.  Policies for Caching OLAP Queries in Internet Proxies , 2006, IEEE Transactions on Parallel and Distributed Systems.

[13]  Lefteris Angelis,et al.  Validation and interpretation of Web users' sessions clusters , 2007, Inf. Process. Manag..

[14]  Wang Yong,et al.  Mining sequential association-rule for improving Web document prediction , 2005, Sixth International Conference on Computational Intelligence and Multimedia Applications (ICCIMA'05).

[15]  Ming-Syan Chen,et al.  On Exploring Aggregate Effect for Efficient Cache Replacement in Transcoding Proxies , 2003, IEEE Trans. Parallel Distributed Syst..

[16]  Yanchun Zhang,et al.  Clustering of web users using session-based similarity measures , 2001, Proceedings 2001 International Conference on Computer Networks and Mobile Computing.

[17]  Themistoklis Palpanas,et al.  Web prefetching using partial match prediction , 1998 .

[18]  László Böszörményi,et al.  A survey of Web cache replacement strategies , 2003, CSUR.

[19]  Chung-Ta King,et al.  Proxy prefetch and prefix caching , 2001, International Conference on Parallel Processing, 2001..

[20]  Yuval Shavitt,et al.  Proxy location problems and their generalizations , 2003, 23rd International Conference on Distributed Computing Systems Workshops, 2003. Proceedings..

[21]  Amit Goel,et al.  A Novel Approach for Clustering Web User Sessions Using RST , 2009, 2009 International Conference on Advances in Computing, Control, and Telecommunication Technologies.

[22]  Olga Stepánková,et al.  Sequential Data Mining: A Comparative Case Study in Development of Atherosclerosis Risk Factors , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).