An Empirical Study on the Change of Web Pages

As web pages are created, destroyed, and updated dynamically, web databases should be frequently updated to keep web pages up-to-date. Understanding the change behavior of web pages certainly helps the administrators manage their web databases. This paper introduces a number of metrics representing various change behavior of the web pages. We have monitored approximately 1.8 million to three million URLs at two-day intervals for 100 days. Using the metrics we propose, we analyze the collected URLs and web pages. In addition, we propose a method that computes the probability that a page will be downloaded on the next crawls.