An Efficient Parallel Crawler in Grid Environment

As the size of the web grows, it is imperative to run multiple crawlers to gather data for search engines. In this paper we study the parallel crawling schema in grid environment. We propose and implement an advanced parallel crawler by introducing the techniques of dynamic partition, and evaluate the crawling schema based on parallel crawlers metrics. An experimental system built on Grid middleware has been tested in the real application.