Using the web infrastructure to preserve web pages
暂无分享,去创建一个
Michael L. Nelson | Martin Klein | Frank McCown | Joan A. Smith | Joan A. Smith | F. McCown | Martin Klein
[1] Clay Shirky,et al. AIHT: Conceptual Issues from Practical Tests , 2005, D Lib Mag..
[2] Jon Postel,et al. Simple Mail Transfer Protocol , 1981, RFC.
[3] Carl Lagoze,et al. Focused Crawls, Tunneling, and Digital Libraries , 2002, ECDL.
[4] John C. Klensin,et al. Simple Mail Transfer Protocol , 2001, RFC.
[5] Ian Clarke,et al. Protecting Free Expression Online with Freenet , 2002, IEEE Internet Comput..
[6] Hector Garcia-Molina,et al. Finding replicated Web collections , 2000, SIGMOD '00.
[7] Curtis E. Dyreson,et al. Managing versions of web documents in a transaction-time web server , 2004, WWW '04.
[8] Herbert Van de Sompel,et al. Resource Harvesting within the OAI-PMH Framework , 2004, D Lib Mag..
[9] Wallace Koehler,et al. Web page change and persistence - A four-year longitudinal study , 2002, J. Assoc. Inf. Sci. Technol..
[10] Hector Garcia-Molina,et al. Parallel crawlers , 2002, WWW.
[11] M. GladneyH.,et al. Trustworthy 100-year digital objects , 2005 .
[12] Sriram Raghavan,et al. Stanford WebBase components and applications , 2006, TOIT.
[13] Idit Keidar,et al. Do not crawl in the DUST: different URLs with similar text , 2006, WWW.
[14] Joachim Feise. An approach to persistence of Web resources , 2001, HYPERTEXT '01.
[15] Sandra Payette,et al. The Mellon Fedora Project , 2002, ECDL.
[16] Filippo Menczer,et al. Evaluating topic-driven web crawlers , 2001, SIGIR '01.
[17] Marc Najork,et al. Spam, damn spam, and statistics: using statistical analysis to locate spam web pages , 2004, WebDB '04.
[18] Michael L. Nelson,et al. Observed Web Robot Behavior on Decaying Web Subsites , 2006, D Lib Mag..
[19] Herbert Van de Sompel,et al. Representing digital assets usingMPEG-21 Digital Item Declaration , 2005, International Journal on Digital Libraries.
[20] Michael L. Nelson,et al. Repository Replication Using NNTP and SMTP , 2006, ECDL.
[21] Stevan Harnad,et al. Applications, Potential Problems and a Suggested Policy for Institutional E-Print Archives , 2002 .
[22] Ricardo A. Baeza-Yates,et al. Crawling the Infinite Web: Five Levels Are Enough , 2004, WAW.
[23] Andrei Z. Broder,et al. Efficient URL caching for world wide web crawling , 2003, WWW '03.
[24] Roy H. Campbell,et al. Internet search engine freshness by Web server help , 2001, Proceedings 2001 Symposium on Applications and the Internet.
[25] Antony I. T. Rowstron,et al. Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility , 2001, SOSP.
[26] David M. Pennock,et al. Analysis of lexical signatures for improving information persistence on the World Wide Web , 2004, TOIS.
[27] Johan Bollen,et al. The Availability and Persistence of Web References in D-Lib Magazine , 2005, ArXiv.
[28] Mary Baker,et al. The LOCKSS peer-to-peer digital preservation system , 2005, TOCS.
[29] William Y. Arms,et al. Building a research library for the history of the web , 2006, Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '06).
[30] Carl Lagoze,et al. Core services in the architecture of the national science digital library (NSDL) , 2002, JCDL '02.
[31] Norman Paskin. Digital object identifiers , 2002 .
[32] Michael L. Nelson,et al. Lazy preservation: reconstructing websites by crawling the crawlers , 2006, WIDM '06.
[33] Robert Wilensky,et al. Robust Hyperlinks Cost Just Five Words Each , 2000 .
[34] Tony Hammond,et al. Social Bookmarking Tools (I): A General Overview , 2005, D Lib Mag..
[35] Herbert Van de Sompel,et al. Notes from the Interoperability Front: A Progress Report on the Open Archives Initiative , 2002, ECDL.
[36] Jerome McDonough,et al. METS: standardized encoding for digital library objects , 2006, International Journal on Digital Libraries.
[37] Garth A. Gibson,et al. RAID: high-performance, reliable secondary storage , 1994, CSUR.
[38] Serge Abiteboul,et al. A First Experience in Archiving the French Web , 2002, ECDL.
[39] Sriram Raghavan,et al. Crawling the Hidden Web , 2001, VLDB.
[40] Zhen Liu,et al. Optimal Robot Scheduling for Web Search Engines , 1998 .
[41] Michael L. Nelson,et al. Just-in-time recovery of missing web pages , 2006, HYPERTEXT '06.
[42] Antonio Gulli,et al. The indexable web is more than 11.5 billion pages , 2005, WWW '05.
[43] A. Rauber,et al. Austrian On-Line Archive Processing : Analyzing Archives of the World Wide Web , 2002 .
[44] David M. Pennock,et al. Persistence of Web References in Scientific Research , 2001, Computer.
[45] Brewster Kahle,et al. Preserving the Internet , 1997 .
[46] Johan Bollen,et al. Archive Ingest and Handling Test: The Old Dominion University Approach , 2005, D Lib Mag..
[47] C. Lee Giles,et al. Digital Libraries and Autonomous Citation Indexing , 1999, Computer.
[48] Hector Garcia-Molina,et al. Effective page refresh policies for Web crawlers , 2003, TODS.
[49] Michael L. Nelson,et al. Object Persistence and Availability in Digital Libraries , 2002, D Lib Mag..
[50] Diomidis Spinellis,et al. The decay and failures of web references , 2003, CACM.
[51] Marc Najork,et al. A large‐scale study of the evolution of Web pages , 2004, Softw. Pract. Exp..
[52] Ben Y. Zhao,et al. Maintenance-Free Global Data Storage , 2001, IEEE Internet Comput..
[53] Hector Garcia-Molina,et al. Finding Near-Replicas of Documents and Servers on the Web , 1998, WebDB.
[54] Hector Garcia-Molina,et al. The Evolution of the Web and Implications for an Incremental Crawler , 2000, VLDB.
[55] Micah Beck,et al. An end-to-end approach to globally scalable network storage , 2002, SIGCOMM '02.
[56] Herbert Van de Sompel,et al. Using MPEG-21 DIDL to Represent Complex Digital Objects in the Los Alamos National Laboratory Digital Library , 2003, D Lib Mag..
[57] Andrew V. Goldberg,et al. A prototype implementation of archival Intermemory , 1999, DL '99.
[58] Hector Garcia-Molina,et al. Peer-to-peer data trading to preserve information , 2002, TOIS.
[59] Johan Bollen,et al. Reconstructing Websites for the Lazy Webmaster , 2005, ArXiv.
[60] Andreas Rauber,et al. Austrian Online Archive Processing: Analyzing Archives of the World Wide Web , 2002, ECDL.
[61] Hector Garcia-Molina,et al. Combating Web Spam with TrustRank , 2004, VLDB.
[62] James S. Plank. A tutorial on Reed-Solomon coding for fault-tolerance in RAID-like systems , 1997 .
[63] Hector Garcia-Molina,et al. Crawler-Friendly Web Servers , 2000, PERV.
[64] John R. Garrett,et al. Avoiding Technological Quicksand : Finding a Viable Technical Foundation for Digital Preservation , 2009 .
[65] Andrei Z. Broder,et al. Mirror, Mirror on the Web: A Study of Host Pairs with Replicated Content , 1999, Comput. Networks.
[66] Herbert Van de Sompel,et al. aDORe: a modular and standards-based digital object repository at the los alamos national laboratory , 2005, Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '05).
[67] Reagan Moore,et al. MySRB & SRB: Components of a Data Grid , 2002 .
[68] Sandeep Pandey,et al. Shuffling a Stacked Deck: The Case for Partially Randomized Ranking of Search Engine Results , 2005, VLDB.
[69] Roy Fielding,et al. Architectural Styles and the Design of Network-based Software Architectures"; Doctoral dissertation , 2000 .
[70] Andrei Z. Broder,et al. Sic transit gloria telae: towards an understanding of the web's decay , 2004, WWW '04.
[71] Roger Dingledine,et al. The Free Haven Project: Distributed Anonymous Storage Service , 2000, Workshop on Design Issues in Anonymity and Unobservability.
[72] Michael L. Nelson,et al. Efficient, automatic web resource harvesting , 2006, WIDM '06.
[73] Herbert Van de Sompel,et al. mod_oai: An Apache Module for Metadata Harvesting , 2005, ECDL.
[74] Michael L. Nelson,et al. Repository replication using SMTP and NNTP , 2006, DG.O.
[75] Michalis Vazirgiannis,et al. Archiving the Greek Web , 2004 .
[76] B. Huberman,et al. The Deep Web : Surfacing Hidden Value , 2000 .
[77] Ming-Feng Chen,et al. A proxy-based personal web archiving service , 2001, OPSR.
[78] Jenny Edwards,et al. An adaptive model for optimizing performance of an incremental web crawler , 2001, WWW '01.
[79] Henry M. Gladney,et al. Trustworthy 100-year digital objects: Evidence after every witness is dead , 2004, TOIS.
[80] Reagan Moore,et al. MySRB and SRB - components of a Data Grid , 2002, Proceedings 11th IEEE International Symposium on High Performance Distributed Computing.
[81] Christopher Olston,et al. What's new on the web?: the evolution of the web from a search engine perspective , 2004, WWW '04.
[82] John Garrett,et al. Preserving Digital Information. Report of the Task Force on Archiving of Digital Information. , 1996 .
[83] J.L. Marill,et al. Tools and techniques for harvesting the World Wide Web , 2004, Proceedings of the 2004 Joint ACM/IEEE Conference on Digital Libraries, 2004..
[84] Hector Garcia-Molina,et al. InfoMonitor: unobtrusively archiving a World Wide Web server , 2005, International Journal on Digital Libraries.
[85] Michael Day,et al. Collecting and preserving the world wide web , 2003 .
[86] MacKenzie Smith,et al. The DSpace institutional digital repository system: current functionality , 2003, 2003 Joint Conference on Digital Libraries, 2003. Proceedings..
[87] Chabane Djeraba,et al. High performance crawling system , 2004, MIR '04.
[88] Larry Lannom,et al. Handle System Overview , 2003, RFC.
[89] Herbert Van de Sompel,et al. The open archives initiative: building a low-barrier interoperability framework , 2001, JCDL '01.
[90] Gul A. Agha,et al. Crawlets: Agents for High Performance Web Search Engines , 2001, Mobile Agents.
[91] Brian Kantor,et al. Network News Transfer Protocol , 1986, RFC.
[92] Hector Garcia-Molina,et al. Estimating frequency of change , 2003, TOIT.
[93] Petros Zerfos,et al. Downloading textual hidden web content through keyword queries , 2005, Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '05).
[94] Hector Garcia-Molina,et al. Efficient Crawling Through URL Ordering , 1998, Comput. Networks.
[95] Michael K. Bergman. White Paper: The Deep Web: Surfacing Hidden Value , 2001 .
[96] Terry L. Harrison,et al. Opal: In Vivo Based Preservation Framework for Locating Lost Web Pages , 2005 .
[97] Michael L. Nelson,et al. Evaluation of crawling policies for a web-repository crawler , 2006, HYPERTEXT '06.
[98] David R. Karger,et al. Wide-area cooperative storage with CFS , 2001, SOSP.
[99] Hector Garcia-Molina,et al. Implementing a Reliable Digital Object Archive , 2000, ECDL.