Lazy preservation: reconstructing websites by crawling the crawlers
暂无分享,去创建一个
[1] James S. Plank,et al. A tutorial on Reed–Solomon coding for fault‐tolerance in RAID‐like systems , 1997, Softw. Pract. Exp..
[2] James S. Plank. A tutorial on Reed-Solomon coding for fault-tolerance in RAID-like systems , 1997 .
[3] Geoffrey Zweig,et al. Syntactic Clustering of the Web , 1997, Comput. Networks.
[4] Amy Friedlander,et al. D-Lib Magazine: Publishing as the Honest Broker , 1998 .
[5] Andrei Z. Broder,et al. Mirror, Mirror on the Web: A Study of Host Pairs with Replicated Content , 1999, Comput. Networks.
[6] Michael D. Gordon,et al. Finding Information on the World Wide Web: The Retrieval Effectiveness of Search Engines , 1999, Inf. Process. Manag..
[7] Hector Garcia-Molina,et al. Finding replicated Web collections , 2000, SIGMOD '00.
[8] Ming-Feng Chen,et al. A proxy-based personal web archiving service , 2001, OPSR.
[9] Vicky Reich,et al. LOCKSS: A Permanent Web Publishing and Access System , 2001, D Lib Mag..
[10] Hal Berghel. Responsible web caching , 2002, CACM.
[11] Marc Najork,et al. A large‐scale study of the evolution of Web pages , 2003, WWW '03.
[12] Hector Garcia-Molina,et al. InfoMonitor: unobtrusively archiving a World Wide Web server , 2005, International Journal on Digital Libraries.
[13] Michael Day,et al. Collecting and preserving the world wide web , 2003 .
[14] Rabia Nuray-Turan,et al. Automatic performance evaluation of Web search engines , 2004, Inf. Process. Manag..
[15] Marc Najork,et al. Spam, damn spam, and statistics: using statistical analysis to locate spam web pages , 2004, WebDB '04.
[16] Christopher Olston,et al. What's new on the web?: the evolution of the web from a search engine perspective , 2004, WWW '04.
[17] Website navigation architectures and their effect on website visibility: a literature survey , 2004 .
[18] Curtis E. Dyreson,et al. Managing versions of web documents in a transaction-time web server , 2004, WWW '04.
[19] Antonio Gulli,et al. The indexable web is more than 11.5 billion pages , 2005, WWW '05.
[20] Johan Bollen,et al. Reconstructing Websites for the Lazy Webmaster , 2005, ArXiv.
[21] Jin Zhang,et al. The impact of webpage content characteristics on webpage visibility in search engine results (Part I) , 2005, Inf. Process. Manag..
[22] Michael L. Nelson,et al. Observed Web Robot Behavior on Decaying Web Subsites , 2006, D Lib Mag..
[23] Dirk Lewandowski,et al. The freshness of web search engine databases , 2006, J. Inf. Sci..
[24] Mohammad Zubair,et al. Search engine coverage of the OAI-PMH corpus , 2006, IEEE Internet Computing.
[25] Michael L. Nelson,et al. Evaluation of crawling policies for a web-repository crawler , 2006, HYPERTEXT '06.