Brass: A queueing manager for Warrick

When an individual loses their website and a backup cannot be found, they can download and run Warrick, a webrepository crawler which will recover their lost website by crawling the holdings of the Internet Archive and several search engine caches. Running Warrick locally requires some technical know-how, so we have created an on-line queueing system called Brass which simplifies the task of recovering lost websites. We discuss the technical aspects of reconstructing websites and the implementation of Brass. Our newly developed system allows anyone to recover a lost website with a few mouse clicks and allows us to track which websites the public is most interested in saving.

[1]  Petros Zerfos,et al.  Downloading textual hidden web content through keyword queries , 2005, Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '05).

[2]  Michael L. Nelson,et al.  Characterization of Search Engine Caches , 2007, ArXiv.

[3]  Hector Garcia-Molina,et al.  Synchronizing a database to improve freshness , 2000, SIGMOD '00.

[4]  Michael L. Nelson,et al.  Agreeing to disagree: search engines and their public interfaces , 2007, JCDL '07.

[5]  S. Wittevrongel,et al.  Queueing Systems , 2019, Introduction to Stochastic Processes and Simulation.

[6]  Sriram Raghavan,et al.  Searching the Web , 2001, ACM Trans. Internet Techn..

[7]  Michael L. Nelson,et al.  Factors affecting website reconstruction from the web infrastructure , 2007, JCDL '07.

[8]  R. Syski,et al.  Fundamentals of Queueing Theory , 1999, Technometrics.

[9]  Leonard Kleinrock,et al.  Queueing Systems: Volume I-Theory , 1975 .

[10]  Michael L. Nelson,et al.  Using the web infrastructure to preserve web pages , 2007, International Journal on Digital Libraries.

[11]  Hector Garcia-Molina,et al.  Crawler-Friendly Web Servers , 2000, PERV.

[12]  Julien Masanès,et al.  Web Archiving Methods and Approaches: A Comparative Study , 2006, Libr. Trends.

[13]  Michael L. Nelson,et al.  Evaluation of crawling policies for a web-repository crawler , 2006, HYPERTEXT '06.

[14]  Marc Najork,et al.  A large‐scale study of the evolution of Web pages , 2003, WWW '03.

[15]  Michael L. Nelson,et al.  Efficient, automatic web resource harvesting , 2006, WIDM '06.

[16]  Sriram Raghavan,et al.  Stanford WebBase components and applications , 2006, TOIT.

[17]  Ricardo A. Baeza-Yates,et al.  Dynamics of the Chilean Web Structure , 2004, WebDyn@WWW.

[18]  Michael L. Nelson,et al.  Lazy preservation: reconstructing websites by crawling the crawlers , 2006, WIDM '06.

[19]  Catherine C. Marshall,et al.  Evaluating Personal Archiving Strategies for Internet-based Information , 2007, ArXiv.

[20]  Wallace Koehler,et al.  An Analysis of Web Page and Web Site Constancy and Permanence , 1999, J. Am. Soc. Inf. Sci..

[21]  Andrei Z. Broder,et al.  Sic transit gloria telae: towards an understanding of the web's decay , 2004, WWW '04.