BUbiNG: massive crawling for the masses
暂无分享,去创建一个
[1] Jenny Edwards,et al. An adaptive model for optimizing performance of an incremental web crawler , 2001, WWW '01.
[2] Sergey Brin,et al. The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.
[3] Gregor von Bochmann,et al. A brief history of web crawlers , 2013, CASCON.
[4] Jens Stoye,et al. Simple and flexible detection of contiguous repeats using a suffix tree , 2002, Theor. Comput. Sci..
[5] Idit Keidar,et al. Do not crawl in the DUST: different URLs with similar text , 2006, WWW.
[6] Geoffrey Zweig,et al. Syntactic Clustering of the Web , 1997, Comput. Networks.
[7] Sebastiano Vigna,et al. UbiCrawler: a scalable fully distributed Web crawler , 2004, Softw. Pract. Exp..
[8] David Eichmann,et al. The RBSE spider — Balancing effective search against Web load , 1994, WWW Spring 1994.
[9] Denis Shestakov. Current Challenges in Web Crawling , 2013, ICWE.
[10] Maged M. Michael,et al. Simple, fast, and practical non-blocking and blocking concurrent queue algorithms , 1996, PODC '96.
[11] Marc Najork,et al. High-performance Web Crawling High-performance Web Crawling Publication History , 2001 .
[12] Marc Najork,et al. A large‐scale study of the evolution of Web pages , 2003, WWW '03.
[13] Hiroki Arimura,et al. Linear-Time Longest-Common-Prefix Computation in Suffix Arrays and Its Applications , 2001, CPM.
[14] Adam Rifkin,et al. Nutch: A Flexible and Scalable Open-Source Web Search Engine , 2005 .
[15] Oliver A. McBryan,et al. GENVL and WWWW: Tools for taming the Web , 1994, WWW Spring 1994.
[16] Burton H. Bloom,et al. Space/time trade-offs in hash coding with allowable errors , 1970, CACM.
[17] Torsten Suel,et al. Design and implementation of a high-performance distributed Web crawler , 2002, Proceedings 18th International Conference on Data Engineering.
[18] Marc Najork,et al. Mercator: A scalable, extensible Web crawler , 1999, World Wide Web.
[19] Dmitri Loguinov,et al. IRLbot: Scaling to 6 billion pages and beyond , 2009, TWEB.
[20] Marc Najork,et al. Web Crawling , 2010, Found. Trends Inf. Retr..
[21] Soumen Chakrabarti,et al. Mining the web - discovering knowledge from hypertext data , 2002 .
[22] Gurmeet Singh Manku,et al. Detecting near-duplicates for web crawling , 2007, WWW '07.
[23] Moses Charikar,et al. Similarity estimation techniques from rounding algorithms , 2002, STOC '02.
[24] B. Pinkerton,et al. Finding What People Want : Experiences with the WebCrawler , 1994, WWW Spring 1994.