Distributed crawling task scheduling method weighted round-robin algorithm
暂无分享,去创建一个
Distributed scheduling method called crawler algorithm weighted wheel, comprising 1) according to the size, the single multi-threaded into the crawler, centralized homogeneous, heterogeneous centralized, distributed small and large distributed five reptiles; 2) master-slave architecture deployment, 3) when the crawler is connected to the first node of the master node, the master node which give initial weights; 4) the master node based on weighted round robin scheduling algorithm, continuous selection a task URL crawler node, a crawling be assigned to it; 5) End whenever a task URL crawler crawling node, and returns the result to the master node, the master node updates the weights of the crawler node; and many more. Distributed crawling scheduling algorithm called Weighted Round proposed by the invention, is for small reptiles and distributed design, enabling each crawler node load balancing, and reptiles node to its flexible scalability and fault tolerance.