On the feasibility of geographically distributed web crawling
暂无分享,去创建一个
Berkant Barla Cambazoglu | Vassilis Plachouras | Flavio Paiva Junqueira | Luca Telloli | F. Junqueira | Vassilis Plachouras | B. B. Cambazoglu | Luca Telloli
[1] Hector Garcia-Molina,et al. Synchronizing a database to improve freshness , 2000, SIGMOD '00.
[2] Hector Garcia-Molina,et al. Estimating frequency of change , 2003, TOIT.
[3] Marc Najork,et al. Breadth-first crawling yields high-quality pages , 2001, WWW '01.
[4] Hector Garcia-Molina,et al. The Evolution of the Web and Implications for an Incremental Crawler , 2000, VLDB.
[5] C. Lee Giles,et al. Accessibility of information on the Web , 2000, INTL.
[6] Serge Abiteboul,et al. Adaptive on-line page importance computation , 2003, WWW '03.
[7] Hector Garcia-Molina,et al. Combating Web Spam with TrustRank , 2004, VLDB.
[8] Torsten Suel,et al. Design and implementation of a high-performance distributed Web crawler , 2002, Proceedings 18th International Conference on Data Engineering.
[9] Hector Garcia-Molina,et al. Parallel crawlers , 2002, WWW.
[10] Dmitri Loguinov,et al. IRLbot: scaling to 6 billion pages and beyond , 2008, WWW.
[11] B. Huffaker,et al. Distance Metrics in the Internet , 2002, Anais do 2002 International Telecommunications Symposium.
[12] Marc Najork,et al. Mercator: A scalable, extensible Web crawler , 1999, World Wide Web.
[13] Qi Lu,et al. Collaborative Web crawling: information gathering/processing over Internet , 1999, Proceedings of the 32nd Annual Hawaii International Conference on Systems Sciences. 1999. HICSS-32. Abstracts and CD-ROM of Full Papers.
[14] José Rufino,et al. Efficient Partitioning Strategies for Distributed Web Crawling , 2007, ICOIN.
[15] Filippo Menczer,et al. Search Engine-Crawler Symbiosis: Adapting to Community Interests , 2003, ECDL.
[16] Ricardo A. Baeza-Yates,et al. Crawling a country: better strategies than breadth-first for web page ordering , 2005, WWW '05.
[17] Charles L. A. Clarke,et al. Topic-oriented collaborative crawling , 2002, CIKM '02.
[18] Ricardo A. Baeza-Yates,et al. Challenges on Distributed Web Retrieval , 2007, 2007 IEEE 23rd International Conference on Data Engineering.
[19] Hector Garcia-Molina,et al. Efficient Crawling Through URL Ordering , 1998, Comput. Networks.
[20] Suman Nath,et al. Beyond Availability: Towards a Deeper Understanding of Machine Failure Characteristics in Large Distributed Systems , 2004, WORLDS.
[21] Sandeep Pandey,et al. Recrawl scheduling based on information longevity , 2008, WWW.
[22] Sebastiano Vigna,et al. UbiCrawler: a scalable fully distributed Web crawler , 2004, Softw. Pract. Exp..
[23] Mark Levene,et al. Web dynamics : adapting to change in content, size, topology and use , 2004 .
[24] Hector Garcia-Molina,et al. Link Spam Alliances , 2005, VLDB.
[25] David Eichmann,et al. The RBSE spider — Balancing effective search against Web load , 1994, WWW Spring 1994.
[26] Berkant Barla Cambazoglu,et al. Architecture of a grid-enabled Web search engine , 2007, Inf. Process. Manag..
[27] Filippo Menczer,et al. Crawling the Web , 2004, Web Dynamics.
[28] George Cybenko,et al. How dynamic is the Web? , 2000, Comput. Networks.
[29] David Eichmann,et al. 2 – Background : Agents in General and Spiders in Particular , 1994 .
[30] Marios D. Dikaiakos,et al. Design and Implementation of a Distributed Crawler and Filtering Processor , 2002, NGITS.
[31] Hector Garcia-Molina,et al. Web Spam Taxonomy , 2005, AIRWeb.
[32] José Rufino,et al. Geographical partition for distributed web crawling , 2005, GIR '05.
[33] Berkant Barla Cambazoglu,et al. Data-Parallel Web Crawling Models , 2004, ISCIS.