Evaluation of crawling policies for a web-repository crawler
暂无分享,去创建一个
[1] Sang Ho Lee,et al. On URL Normalization , 2005, ICCSA.
[2] Michalis Vazirgiannis,et al. Archiving the Greek Web , 2004 .
[3] B. Huberman,et al. The Deep Web : Surfacing Hidden Value , 2000 .
[4] Martin van den Berg,et al. Focused Crawling: A New Approach to Topic-Specific Web Resource Discovery , 1999, Comput. Networks.
[5] Filippo Menczer,et al. Evaluating topic-driven web crawlers , 2001, SIGIR '01.
[6] Ricardo A. Baeza-Yates,et al. Crawling a country: better strategies than breadth-first for web page ordering , 2005, WWW '05.
[7] Sriram Raghavan,et al. Crawling the Hidden Web , 2001, VLDB.
[8] Roy T. Fielding,et al. Uniform Resource Identifiers (URI): Generic Syntax , 1998, RFC.
[9] Marc Najork,et al. Spam, damn spam, and statistics: using statistical analysis to locate spam web pages , 2004, WebDB '04.
[10] Michael L. Nelson,et al. Observed Web Robot Behavior on Decaying Web Subsites , 2006, D Lib Mag..
[11] John Garrett,et al. Preserving Digital Information. Report of the Task Force on Archiving of Digital Information. , 1996 .
[12] Geoffrey Zweig,et al. Syntactic Clustering of the Web , 1997, Comput. Networks.
[13] Rick Bennett,et al. Trends in the Evolution of the Public Web: 1998 - 2002 , 2003, D Lib Mag..
[14] Andrei Z. Broder,et al. Sic transit gloria telae: towards an understanding of the web's decay , 2004, WWW '04.
[15] Z. Dalai,et al. Managing distributed collections: evaluating Web page changes, movement, and replacement , 2004, Proceedings of the 2004 Joint ACM/IEEE Conference on Digital Libraries, 2004..
[16] Hector Garcia-Molina,et al. Finding replicated Web collections , 2000, SIGMOD '00.
[17] Petros Zerfos,et al. Downloading textual hidden web content through keyword queries , 2005, Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '05).
[18] Hector Garcia-Molina,et al. Parallel crawlers , 2002, WWW.
[19] Michael L. Nelson,et al. Just-in-time recovery of missing web pages , 2006, HYPERTEXT '06.
[20] Herbert Van de Sompel,et al. mod_oai: An Apache Module for Metadata Harvesting , 2005, ECDL.
[21] Hector Garcia-Molina,et al. Synchronizing a database to improve freshness , 2000, SIGMOD '00.
[22] Marc Najork,et al. Breadth-first crawling yields high-quality pages , 2001, WWW '01.
[23] Philip S. Yu,et al. Optimal crawling strategies for web search engines , 2002, WWW '02.
[24] Sriram Raghavan,et al. Searching the Web , 2001, ACM Trans. Internet Techn..
[25] James S. Plank,et al. A tutorial on Reed–Solomon coding for fault‐tolerance in RAID‐like systems , 1997, Softw. Pract. Exp..
[26] Michael O. Rabin,et al. Efficient dispersal of information for security, load balancing, and fault tolerance , 1989, JACM.
[27] David W. Embley,et al. On the Automatic Extraction of Data from the Hidden Web , 2001, ER.
[28] Hector Garcia-Molina,et al. The Evolution of the Web and Implications for an Incremental Crawler , 2000, VLDB.
[29] Marián Boguñá,et al. Decoding the structure of the WWW: facts versus sampling biases , 2005, ArXiv.
[30] Catherine C. Marshall,et al. Saving private hypertext: requirements and pragmatic dimensions for preservation , 2004, HYPERTEXT '04.
[31] Ricardo A. Baeza-Yates,et al. Characterization of national Web domains , 2007, TOIT.
[32] David W. Embley,et al. Extracting Data behind Web Forms , 2002, ER.
[33] Torsten Suel,et al. Design and implementation of a high-performance distributed Web crawler , 2002, Proceedings 18th International Conference on Data Engineering.
[34] Daniel Gomes,et al. Characterizing a national community web , 2005, TOIT.
[35] D. M. Hutton,et al. Web Dynamics - Adapting to Change in Content, Size, Topology and Use , 2006 .
[36] Sougata Mukherjea,et al. Organizing topic-specific web information , 2000, HYPERTEXT '00.
[37] Johan Bollen,et al. Distributed, real-time computation of community preferences , 2005, HYPERTEXT '05.
[38] John R. Garrett,et al. Task Force on Archiving of Digital Information , 1995, D Lib Mag..
[39] Roy T. Fielding,et al. Uniform Resource Identifier (URI): Generic Syntax , 2005, RFC.
[40] Jenny Edwards,et al. An adaptive model for optimizing performance of an incremental web crawler , 2001, WWW '01.
[41] Marco Gori,et al. Focused Crawling Using Context Graphs , 2000, VLDB.
[42] Johan Bollen,et al. Reconstructing Websites for the Lazy Webmaster , 2005, ArXiv.
[43] James S. Plank. A tutorial on Reed-Solomon coding for fault-tolerance in RAID-like systems , 1997 .
[44] Marc Najork,et al. Detecting phrase-level duplication on the world wide web , 2005, SIGIR '05.
[45] Chabane Djeraba,et al. High performance crawling system , 2004, MIR '04.
[46] Vivian Cothey,et al. Web-crawling reliability , 2004, J. Assoc. Inf. Sci. Technol..
[47] Filippo Menczer,et al. Crawling the Web , 2004, Web Dynamics.