论文信息 - Implementation of Parallel Web Crawler through .NET Technology

Implementation of Parallel Web Crawler through .NET Technology

The WWW is increasing at very fast rate and data or information present over web is changes very frequently. As the web is very dynamic, it becomes very difficult to get related and fresh information. In this paper we design and develop a program for web crawler which uses multiple HTTP for crawling the web. Here we use multiple threads for implementation of multiple HTTP connection. The whole downloading process can be reduced with the help of multiple threads. This paper deals with a system which is based on web crawler using .net technology. The proposed approach is implemented in VB.NET with multithread to crawl the web pages in parallel and crawled data is stored in central database (Sql Server). The duplicacy of record is checked through stored procedure which is pre complied & checks the result very fast. The proposed architecture is very fast and allows many crawlers to crawl the data in parallel.

V. S. Dhaka | Sanjeev Kumar Singh | Md. Abu Kausar

[1] V. S. Dhaka,et al. An Effective Parallel Web Crawler based on Mobile Agent and Incremental Crawling , 2013 .

[2] V. S. Dhaka,et al. Web Crawler: A Review , 2013 .

[3] Andrei Z. Broder,et al. Efficient URL caching for world wide web crawling , 2003, WWW '03.

[4] Sriram Raghavan,et al. Searching the Web , 2001, ACM Trans. Internet Techn..

[5] Hector Garcia-Molina,et al. Parallel crawlers , 2002, WWW.

[6] Vinton G. Cerf,et al. A brief history of the internet , 1999, CCRV.

[7] Torsten Suel,et al. Design and implementation of a high-performance distributed Web crawler , 2002, Proceedings 18th International Conference on Data Engineering.

[8] Marc Najork,et al. Mercator: A scalable, extensible Web crawler , 1999, World Wide Web.

[9] Hector Garcia-Molina,et al. The Evolution of the Web and Implications for an Incremental Crawler , 2000, VLDB.

[10] George Samaras,et al. Minimizing the Network Distance in Distributed Web Crawling , 2004, CoopIS/DOA/ODBASE.

[11] Filippo Menczer,et al. Crawling the Web , 2004, Web Dynamics.

[12] Hector Garcia-Molina,et al. Efficient Crawling Through URL Ordering , 1998, Comput. Networks.

[13] V. S. Dhaka,et al. Web Crawler Based on Mobile Agent and Java Aglets , 2013 .

[14] Marc Najork,et al. A large‐scale study of the evolution of Web pages , 2004, Softw. Pract. Exp..