Study on Efficiency of Predictive Prefetching and Caching Algorithms

World Wide Web is an important area for data mining research due to the huge amount of information. The success of the WWW depends on response time. Predictive prefetching is an important technique to reduce latency. To predict the user request, millions of web logs from server side need to be analyzed. Identification of user session boundaries is one of the most important processes for predictive prefetching of user next request based on their navigation behavior. In this paper user session boundaries are identified using IPaddress, browsing agent, and then by considering intersession and intrasession timeouts, and immediate link analysis. A complete set of user session sequences and the learning graph based on these user session sequences is also generated. We note that all the works ignored some of the following important issues for the prediction. They are Analysis of non-prefetchable items, prefetching objects that are newly created or never visited before, Analysis of aging factor, Document size to be cached and cache utilization factors, and Analysis of document duplication process. This paper proposed the algorithm that prefetches the objects based on all the above factors except the document duplication problem. The survey indicates that GDSF based Predictive Web Caching (NGRAM) and keyword based semantic prefetching with LRU (KBSP) methods outperforms than the existing methods. So, in this study NGRAM and KBSP methods performances are compared against the proposed algorithms. The performance metrics in our experimental study are prefetching Hit ratio, Byte hit ratio, and Waste ratio for different cache sizes.