Memento: Time Travel for the Web

The Web is ephemeral. Many resources have representations that change over time, and many of those representations are lost forever. A lucky few manage to reappear as archived resources that carry their own URIs. For example, some content management systems maintain version pages that reect a frozen prior state of their changing resources. Archives recurrently crawl the web to obtain the actual representation of resources, and subsequently make those available via special-purpose archived resources. In both cases, the archival copies have URIs that are protocolwise disconnected from the URI of the resource of which they represent a prior state. Indeed, the lack of temporal capabilities in the most common Web protocol, HTTP, prevents getting to an archived resource on the basis of the URI of its original. This turns accessing archived resources into a signicant discovery challenge for both human and software agents, which typically involves following a multitude of links from the original to the archival resource, or of searching archives for the original URI. This paper proposes the protocol-based Memento solution to address this problem, and describes a proof-of-concept experiment that includes major servers of archival content, including Wikipedia and the Internet Archive. The Memento solution is based on existing HTTP capabilities applied in a novel way to add the temporal dimension. The result is a framework in which archived resources can seamlessly be reached via the URI of their original: protocol-based time travel for the Web.

[1]  Herbert Van de Sompel,et al.  An HTTP-Based Versioning Mechanism for Linked Data , 2010, LDOW.

[2]  Satoshi Nakamura,et al.  Journey to the past: proposal of a framework for past web browser , 2006, HYPERTEXT '06.

[3]  Harihar Shankar,et al.  Implementing Time Travel for the Web , 2011 .

[4]  Christopher Olston,et al.  What's new on the web?: the evolution of the web from a search engine perspective , 2004, WWW '04.

[5]  Saul Greenberg,et al.  How people revisit web pages: empirical findings and implications for the design of history systems , 1997, Int. J. Hum. Comput. Stud..

[6]  Susan T. Dumais,et al.  Resonance on the web: web dynamics and revisitation patterns , 2009, CHI.

[7]  Susan T. Dumais,et al.  The web changes everything: understanding the dynamics of web content , 2009, WSDM '09.

[8]  Herbert Van de Sompel,et al.  Making web annotations persistent over time , 2010, JCDL '10.

[9]  Michael L. Nelson,et al.  Revisiting Lexical Signatures to (Re-)Discover Web Pages , 2008, ECDL.

[10]  Curtis E. Dyreson,et al.  Managing versions of web documents in a transaction-time web server , 2004, WWW '04.

[11]  Herbert Van de Sompel,et al.  Analyzing the Persistence of Referenced Web Resources with Memento , 2011, ArXiv.

[12]  Marc Najork,et al.  A large‐scale study of the evolution of Web pages , 2004, Softw. Pract. Exp..

[13]  David M. Pennock,et al.  Analysis of lexical signatures for improving information persistence on the World Wide Web , 2004, TOIS.

[14]  Herbert Van de Sompel,et al.  The open archives initiative: building a low-barrier interoperability framework , 2001, JCDL '01.

[15]  Previous version: , 2004 .

[16]  Julien Masanès,et al.  Web Archiving , 2014, Encyclopedia of Social Network Analysis and Mining.

[17]  Eelco Herder,et al.  Web page revisitation revisited: implications of a long-term click-stream study of browser usage , 2007, CHI.

[18]  Roy T. Fielding,et al.  Hypertext Transfer Protocol - HTTP/1.0 , 1996, RFC.

[19]  Wallace Koehler,et al.  Web page change and persistence - A four-year longitudinal study , 2002, J. Assoc. Inf. Sci. Technol..

[20]  Ming-Feng Chen,et al.  A proxy-based personal web archiving service , 2001, OPSR.

[21]  Jure Leskovec,et al.  Meme-tracking and the dynamics of the news cycle , 2009, KDD.

[22]  Elna Saxton Archiving Websites: A Practical Guide for Information Management Professionals , 2007 .

[23]  Andrew H. Mutz,et al.  Transparent Content Negotiation in HTTP , 1998, RFC.

[24]  Herbert Van de Sompel,et al.  Adding eScience Assets to the Data Web , 2009, ArXiv.

[25]  Tom Heath,et al.  How to Publish Linked Data on the Web - Proposal for a Half-day Tutorial at ISWC2008 , 2008 .

[26]  Robert Wilensky,et al.  Robust Hyperlinks Cost Just Five Words Each , 2000 .

[27]  Susan T. Dumais,et al.  Changing how people view changes on the web , 2009, UIST '09.

[28]  Srikanta J. Bedathur,et al.  EverLast: a distributed architecture for preserving the web , 2009, JCDL '09.

[29]  Hector Garcia-Molina,et al.  InfoMonitor: unobtrusively archiving a World Wide Web server , 2005, International Journal on Digital Libraries.

[30]  Michael L. Nelson,et al.  Just-in-time recovery of missing web pages , 2006, HYPERTEXT '06.

[31]  Mark Nottingham,et al.  The Atom Syndication Format , 2005, RFC.

[32]  Mira Dontcheva,et al.  Zoetrope: interacting with the ephemeral web , 2008, UIST '08.