论文信息 - A new approach in dynamic prediction for user based web page crawling

A new approach in dynamic prediction for user based web page crawling

Maximum available Web prediction techniques typically follow Markov model for Web based prediction. Everybody knows that there are lots of Web links or URLs on any Web page. So, it is very hard to predict the next Web page from the huge number of Web links. Existing approaches predict successfully on the private (personal) computer using different Markov models. In case of public (like cyber cafe) computers, prediction can not be done at all, since many people use the same machine in this type of scenario. In this paper, we propose a new policy on Web prediction using the dynamic behavior of users. We demonstrate four procedures for Web based prediction to make it faster. Our technique does not require any Web-log or usage history at client machine. We are going to use the mouse movement and its direction for the prediction of next Web page. We track the mouse position and its respective direction instead of using Markov model. In this research work, we introduce a fully dynamic Web prediction scheme, since Web-log or any type of static or previous information has not been utilized in our approach. In this paper, we try to minimize the number of Web links to be considered of any Web page in runtime for achieving better accuracy in dynamic Web prediction. Our approach shows the step-wise build-up of a solid Web prediction program which is appropriate in both the private as well as public scenario. Overall, this method shows a new way for prediction using dynamic nature of the respective users.

Anirban Kundu | Sutirtha Kr. Guha | Arnab Mitra | Tapas Mukherjee

[1] Peter Pirolli,et al. Mining Longest Repeating Subsequences to Predict World Wide Web Surfing , 1999, USENIX Symposium on Internet Technologies and Systems.

[2] Dan Duchamp,et al. Prefetching Hyperlinks , 1999, USENIX Symposium on Internet Technologies and Systems.

[3] Sergey Brin,et al. The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[4] Themistoklis Palpanas,et al. Web prefetching using partial match prediction , 1998 .

[5] Darrell D. E. Long,et al. Exploring the Bounds of Web Latency Reduction from Caching and Prefetching , 1997, USENIX Symposium on Internet Technologies and Systems.

[6] Xin Chen,et al. A Popularity-Based Prediction Model for Web Prefetching , 2003, Computer.

[7] Debajyoti Mukhopadhyay,et al. An Agent Based Method for Web Page Prediction , 2007, KES-AMSTA.

[8] Gopal K Gupta,et al. Introduction to Data Mining with Case Studies , 2011 .

[9] Qiang Yang,et al. WhatNext: a prediction system for Web requests using n-gram sequence models , 2000, Proceedings of the First International Conference on Web Information Systems Engineering.

[10] Brian D. Davison. Learning Web Request Patterns , 2004, Web Dynamics.

[11] Edie M. Rasmussen,et al. Clustering Algorithms , 1992, Information Retrieval: Data Structures & Algorithms.

[12] Marti A. Hearst,et al. Reexamining the cluster hypothesis: scatter/gather on retrieval results , 1996, SIGIR '96.

[13] Masaru Kitsuregawa,et al. On Combining Link and Contents Information for Web Page Clustering , 2002, DEXA.

[14] Huberman,et al. Strong regularities in world wide web surfing , 1998, Science.