A dynamic reconfiguration model for a distributed web crawling system

A web crawling system employing a parallel and distributed architecture needs to have a mechanism to bring the whole system in a coordinated state when the nodes are added to or removed from the system. This paper presents an efficient dynamic reconfiguration model that can be used in such a system. The study shows that this model leads to some nice properties, such as load balance and low traffic in the system, which contribute to high performance. Currently this model is being implemented in WebGather, a well-known Chinese and English web search engine.

[1]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[2]  Hongfei Yan,et al.  Architectural design and evaluation of an efficient web-crawling system , 2001, Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001.

[3]  Ming Lei,et al.  Digging for gold on the Web: experience with the WebGather , 2000, Proceedings Fourth International Conference/Exhibition on High Performance Computing in the Asia-Pacific Region.