Integrating preservation functions into the web server
暂无分享,去创建一个
[1] Hector Garcia-Molina,et al. Finding replicated Web collections , 2000, SIGMOD 2000.
[2] Dan Cohen. Rough Start For Digital Preservation , 2006 .
[3] Roy T. Fielding,et al. Uniform Resource Identifier (URI): Generic Syntax , 2005, RFC.
[4] Christopher Olston,et al. What's new on the web?: the evolution of the web from a search engine perspective , 2004, WWW '04.
[5] Vicky Reich,et al. Requirements for Digital Preservation Systems: A Bottom-Up Approach , 2005, D Lib Mag..
[6] Michael L. Nelson,et al. CRATE: A Simple Model for Self-Describing Web Resources , 2007 .
[7] Andrew Waugh. The design of the VERS encapsulated object experience with an archival information package , 2005, International Journal on Digital Libraries.
[8] Johan Bollen,et al. Archive Ingest and Handling Test: The Old Dominion University Approach , 2005, D Lib Mag..
[9] Monika Henzinger,et al. Hyperlink Analysis for the Web , 2001, IEEE Internet Comput..
[10] Michael L. Nelson,et al. Site Design Impact on Robots: An Examination of Search Engine Crawler Behavior at Deep and Wide Websites , 2008, D Lib Mag..
[11] Luis Gravano,et al. Probe, count, and classify: categorizing hidden web databases , 2001, SIGMOD '01.
[12] Herbert Van de Sompel,et al. Using the OAI-PMH ... Differently , 2003, D Lib Mag..
[13] Michael L. Nelson,et al. Efficient, automatic web resource harvesting , 2006, WIDM '06.
[14] Herbert Van de Sompel,et al. mod_oai: An Apache Module for Metadata Harvesting , 2005, ECDL.
[15] Peter S. Lyman. Archiving the World Wide Web , 2002 .
[16] Robert T. Braden,et al. Requirements for Internet Hosts - Communication Layers , 1989, RFC.
[17] Peter B. Danzig,et al. The Harvest Information Discovery and Access System , 1995, Comput. Networks ISDN Syst..
[18] Jane Greenberg,et al. Final Report for the AMeGA (Automatic Metadata Generation Applications) Project , 2005 .
[19] Gilad Mishne,et al. Blocking Blog Spam with Language Model Disagreement , 2005, AIRWeb.
[20] Magnus Karlsson,et al. Dynamics and evolution of Web sites: analysis, metrics and design issues , 2001, Proceedings. Sixth IEEE Symposium on Computers and Communications.
[21] Michael L. Nelson,et al. Repository replication using SMTP and NNTP , 2006, DG.O.
[22] E. James Whitehead,et al. HTTP Extensions for Distributed Authoring - WEBDAV , 1999, RFC.
[23] Ross Wilkinson,et al. Preserving digital information forever , 2000, DL '00.
[24] R. Fielding,et al. Architectural Styles and the Design of Network-based Software Architectures (CHAPTER 5) , 2000 .
[25] Lawrence Shaw Mayo,et al. The Harvest of a Quiet Eye , 1928 .
[26] Mary Baker,et al. The LOCKSS peer-to-peer digital preservation system , 2005, TOCS.
[27] Zhenyu Liu,et al. A probabilistic approach to metasearching with adaptive probing , 2004, Proceedings. 20th International Conference on Data Engineering.
[28] Carlos Castillo. Cooperation schemes between a Web server and a Web search engine , 2003, Proceedings of the IEEE/LEOS 3rd International Conference on Numerical Simulation of Semiconductor Optoelectronic Devices (IEEE Cat. No.03EX726).
[29] Michael L. Nelson,et al. Creating Preservation-Ready Web Resources , 2008, D Lib Mag..
[30] Jon Postel,et al. DOD standard internet protocol , 1980, CCRV.
[31] Idit Keidar,et al. Do not crawl in the DUST: different URLs with similar text , 2006, WWW.
[32] Marc Najork,et al. A large‐scale study of the evolution of Web pages , 2004, Softw. Pract. Exp..
[33] Andrew Waugh. The Design and Implementation of an Ingest Function to a Digital Archive , 2007, D Lib Mag..
[34] Michael L. Nelson,et al. A Survey of Complex Object Technologies for Digital Libraries , 2001 .
[35] Jeff Rothenberg. Ensuring the Longevity of Digital Information , 1998 .
[36] Clifford A. Lynch. When documents deceive: trust and provenance as new factors for information retrieval in a tangled web , 2001 .
[37] Antonio Gulli,et al. The indexable web is more than 11.5 billion pages , 2005, WWW '05.
[38] Nathaniel S. Borenstein,et al. Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies , 1996, RFC.
[39] Herbert Van de Sompel,et al. Notes from the Interoperability Front: A Progress Report on the Open Archives Initiative , 2002, ECDL.
[40] Herbert Van de Sompel,et al. The OAI-PMH static repository and static repository gateway , 2003, 2003 Joint Conference on Digital Libraries, 2003. Proceedings..
[41] Gary S. Robinson,et al. History and Impact of Computer Standards , 1996, Computer.
[42] Henry M. Gladney,et al. Trustworthy 100-year digital objects: Evidence after every witness is dead , 2004, TOIS.
[43] Francois Yergeau,et al. UTF-8, a transformation format of ISO 10646 , 1998, RFC.
[44] Kurt Maly,et al. Repository synchronization in the OAI framework , 2003, 2003 Joint Conference on Digital Libraries, 2003. Proceedings..
[45] Marc Najork,et al. Detecting spam web pages through content analysis , 2006, WWW '06.
[46] IL University of Illinois at Urbana-Champaign,et al. Inter-indexer consistency studies, 1954-1975: a review of the literature and summary of study results / , 2007 .
[47] Bas Savenije,et al. The National Library of the Netherlands , 2009 .
[48] William Y. Arms. Digital Libraries , 1999 .
[49] Michal Cutler,et al. The portrait of a common HTML web page , 2006, DocEng '06.
[50] Johan Bollen,et al. Reconstructing Websites for the Lazy Webmaster , 2005, ArXiv.
[51] Carl Lagoze,et al. The Open Archives Initiative Protocol for Metadata Harvesting Protocol , 2002 .
[52] Hector Garcia-Molina,et al. Crawler-Friendly Web Servers , 2000, PERV.
[53] Nathaniel S. Borenstein,et al. Multipurpose Internet Mail Extensions , 1992 .
[54] Sriram Raghavan,et al. Crawling the Hidden Web , 2001, VLDB.
[55] Andrei Z. Broder,et al. Mirror, Mirror on the Web: A Study of Host Pairs with Replicated Content , 1999, Comput. Networks.
[56] Herbert Van de Sompel,et al. The open archives initiative , 2001 .
[57] Johan Stapel. Koninklijke Bibliotheek National Library of The Netherlands , 2003 .
[58] Tim Berners-Lee,et al. Universal Resource Identifiers in WWW: A Unifying Syntax for the Expression of Names and Addresses of Objects on the Network as used in the World-Wide Web , 1994, RFC.
[59] Dave Johnson,et al. RSS and Atom in Action: Web 2.0 Building Blocks , 2006 .
[60] William Y. Arms. Key concepts in the architecture of the digital library , 1995, D Lib Mag..
[61] Clay Shirky,et al. AIHT: Conceptual Issues from Practical Tests , 2005, D Lib Mag..
[62] David Bearman. Reality and Chimeras in the Preservation of Electronic Records , 1999, D Lib Mag..
[63] Mo Chen,et al. A practical system of keyphrase extraction for web pages , 2005, CIKM '05.
[64] Rebecca S. Guenther,et al. MODS: The Metadata Object Description Schema , 2003 .
[65] Steven Pemberton,et al. RDFa in XHTML: Syntax and Processing , 2008 .
[66] James A. Hendler,et al. The Semantic Web" in Scientific American , 2001 .
[67] M. Hauben,et al. Netizens: On the History and Impact of Usenet and the Internet , 1998, First Monday.
[68] Andrew Tomkins,et al. The volume and evolution of web page templates , 2005, WWW '05.
[69] Martin Bergman,et al. The deep web:surfacing the hidden value , 2000 .
[70] Gary D. Scudder,et al. On the selection of efficient record segmentations and backup strategies for large shared databases , 1984, TODS.
[71] Lois Mai Chan,et al. Inter-Indexer Consistency in Subject Cataloging. , 1989 .
[72] Betty Furrie,et al. Understanding Marc Bibliographic: Machine-Readable Cataloging , 2003 .
[73] Kevin Hemenway,et al. Spidering Hacks , 2003 .
[74] Mohammad Zubair,et al. Search engine coverage of the OAI-PMH corpus , 2006, IEEE Internet Computing.
[75] Roy T. Fielding,et al. Hypertext Transfer Protocol - HTTP/1.0 , 1996, RFC.
[76] Vivian Cothey,et al. Web-crawling reliability , 2004, J. Assoc. Inf. Sci. Technol..
[77] Herbert Van de Sompel,et al. The Santa Fe Convention of the Open Archives Initiative , 2000, D Lib Mag..
[78] Filippo Menczer,et al. Crawling the Web , 2004, Web Dynamics.
[79] Jerome McDonough,et al. METS: standardized encoding for digital library objects , 2006, International Journal on Digital Libraries.
[80] Hector Garcia-Molina,et al. Effective page refresh policies for Web crawlers , 2003, TODS.
[81] Clifford A. Lynch,et al. When documents deceive: Trust and provenance as new factors for information retrieval in a tangled web , 2001, J. Assoc. Inf. Sci. Technol..
[82] Michael L. Nelson,et al. How much preservation do I get if I do absolutely nothing? Using the Web Infrastructure for Digital Preservation , 2007 .
[83] Rick Bennett,et al. Trends in the Evolution of the Public Web: 1998 - 2002 , 2003, D Lib Mag..
[84] Herbert Van de Sompel,et al. Object Re-Use & Exchange: A Resource-Centric Approach , 2008, ArXiv.
[85] Darren R. Hardy,et al. Customized information extraction as a basis for resource discovery , 1996, TOCS.
[86] Luke Rodgers. What is RSS , 2008 .
[87] Philip R. Zimmermann,et al. The official PGP user's guide , 1996 .
[88] B. Huberman,et al. The Deep Web : Surfacing Hidden Value , 2000 .
[89] William J. Broad. US Web Archive Is Said to Reveal a Nuclear Primer , 2006 .
[90] Geoffrey M. Voelker,et al. Characterization of a Large Web Site Population with Implications for Content Delivery , 2004, WWW '04.
[91] Hector Garcia-Molina,et al. Estimating frequency of change , 2003, TOIT.
[92] Charles F. Thomas,et al. Who Will Create The Metadata For The Internet? , 1998, First Monday.
[93] Ricardo A. Baeza-Yates,et al. Characterization of national Web domains , 2007, TOIT.
[94] Michael L. Nelson,et al. Factors affecting website reconstruction from the web infrastructure , 2007, JCDL '07.
[95] Michael L. Nelson,et al. Using the web infrastructure to preserve web pages , 2007, International Journal on Digital Libraries.
[96] Michael L. Nelson,et al. Generating best-effort preservation metadata for web resources at time of dissemination , 2007, JCDL '07.
[97] David M. Levy,et al. Heroic measures: reflections on the possibility and purpose of digital preservation , 1998, DL '98.
[98] Stuart Weibel. Metadata: the foundations of resource description , 1995, D Lib Mag..
[99] Michael L. Nelson,et al. Repository Replication Using NNTP and SMTP , 2006, ECDL.
[100] Michael L. Nelson,et al. Brass: A queueing manager for Warrick , 2007 .
[101] Sriram Raghavan,et al. WebBase: a repository of Web pages , 2000, Comput. Networks.
[102] Roy T. Fielding,et al. Hypertext Transfer Protocol - HTTP/1.0 , 1996, RFC.
[103] Michael L. Nelson,et al. Observed Web Robot Behavior on Decaying Web Subsites , 2006, D Lib Mag..
[104] Herbert Van de Sompel,et al. Representing digital assets usingMPEG-21 Digital Item Declaration , 2005, International Journal on Digital Libraries.
[105] Andrew H. Mutz,et al. Transparent Content Negotiation in HTTP , 1998, RFC.
[106] Marc Najork,et al. A large‐scale study of the evolution of Web pages , 2003, WWW '03.
[107] Hector Garcia-Molina,et al. Efficient Crawling Through URL Ordering , 1998, Comput. Networks.
[108] Herbert Van de Sompel,et al. A Standards-based Solution for the Accurate Transfer of Digital Assets , 2005, D Lib Mag..
[109] Herbert Van de Sompel,et al. IJDL special issue on complex digital objects: Guest editors' introduction , 2005, International Journal on Digital Libraries.
[110] Michael L. Nelson,et al. A Quantitative Evaluation of Dissemination-Time Preservation Metadata , 2008, ECDL.
[111] Herbert Van de Sompel,et al. Open Archives Initiative - Protocol for Metadata Harvesting - Guidelines for Repository Implementers , 2005 .
[112] Simon Josefsson,et al. The Base16, Base32, and Base64 Data Encodings , 2003, RFC.
[113] Clifford A. Lynch,et al. Canonicalization: A Fundamental Tool to Facilitate Preservation and Management of Digital Information , 1999, D-Lib Magazine.
[114] Petros Zerfos,et al. Downloading textual hidden web content through keyword queries , 2005, Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '05).
[115] Hector Garcia-Molina,et al. Finding replicated Web collections , 2000, SIGMOD '00.
[116] Herbert Van de Sompel,et al. Resource Harvesting within the OAI-PMH Framework , 2004, D Lib Mag..
[117] William Y. Arms. Preservation of Scientific Serials: Three Current Examples , 1999 .
[118] Tim Berners-Lee,et al. Uniform Resource Locators (URL) , 1994, RFC.
[119] Robert Wilensky,et al. A framework for distributed digital object services , 2006, International Journal on Digital Libraries.
[120] Nathaniel S. Borenstein,et al. MIME (Multipurpose Internet Mail Extensions) Part One: Mechanisms for Specifying and Describing the Format of Internet Message Bodies , 1992, RFC.
[121] Eric Miller,et al. An Introduction to the Resource Description Framework , 1998, D Lib Mag..
[122] Ling Liu,et al. Probe, cluster, and discover: focused extraction of QA-Pagelets from the deep Web , 2004, Proceedings. 20th International Conference on Data Engineering.
[123] Clay Shirky. Library of Congress Archive Ingest and Handling Test (AIHT) Final Report , 2006 .
[124] Yuval Shavitt,et al. Constrained mirror placement on the Internet , 2001, Proceedings IEEE INFOCOM 2001. Conference on Computer Communications. Twentieth Annual Joint Conference of the IEEE Computer and Communications Society (Cat. No.01CH37213).
[125] Marc Najork,et al. Mercator: A scalable, extensible Web crawler , 1999, World Wide Web.