World Wide Web is an important area for data mining research due to the huge amount of information. The success of the WWW depends on response time. Predictive prefetching is an important technique to reduce latency. To predict the user request, millions of web logs from server side need to be analyzed. Identification of user session boundaries is one of the most important processes for predictive prefetching of user next request based on their navigation behavior. In this paper user session boundaries are identified using IPaddress, browsing agent, and then by considering intersession and intrasession timeouts, and immediate link analysis. A complete set of user session sequences and the learning graph based on these user session sequences is also generated. We note that all the works ignored some of the following important issues for the prediction. They are Analysis of non-prefetchable items, prefetching objects that are newly created or never visited before, Analysis of aging factor, Document size to be cached and cache utilization factors, and Analysis of document duplication process. This paper proposed the algorithm that prefetches the objects based on all the above factors except the document duplication problem. The survey indicates that GDSF based Predictive Web Caching (NGRAM) and keyword based semantic prefetching with LRU (KBSP) methods outperforms than the existing methods. So, in this study NGRAM and KBSP methods performances are compared against the proposed algorithms. The performance metrics in our experimental study are prefetching Hit ratio, Byte hit ratio, and Waste ratio for different cache sizes.
[1]
Zhixiang Chen,et al.
Optimal Algorithms for Finding User Access Sessions from Very Large Web Logs
,
2002,
PAKDD.
[2]
Qiang Yang,et al.
Web-Log Mining for Predictive Web Caching
,
2003,
IEEE Trans. Knowl. Data Eng..
[3]
Ludmila Cherkasova,et al.
Improving WWW Proxies Performance with Greedy-Dual- Size-Frequency Caching Policy
,
1998
.
[4]
Cheng-Zhong Xu,et al.
A keyword-based semantic prefetching approach in Internet news services
,
2004,
IEEE Transactions on Knowledge and Data Engineering.
[5]
Sandy Irani,et al.
Page replacement with multi-size pages and applications to Web caching
,
1997,
STOC '97.
[6]
Hans-Peter Kriegel,et al.
A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise
,
1996,
KDD.
[7]
Jaideep Srivastava,et al.
Data Preparation for Mining World Wide Web Browsing Patterns
,
1999,
Knowledge and Information Systems.
[8]
Arumugam Gurusamy,et al.
Optimal Algorithms for Generation of User Session Sequences Using Server Side Web User Logs
,
2009,
2009 International Conference on Network and Service Security.
[9]
Philip S. Yu,et al.
On disk caching of Web objects in proxy servers
,
1997,
CIKM '97.
[10]
James E. Pitkow,et al.
Characterizing Browsing Strategies in the World-Wide Web
,
1995,
Comput. Networks ISDN Syst..