Warclight: A Rails Engine for Web Archive Discovery

This paper describes the development of Warclight, a portmanteau of the open-source Blacklight platform and the ISO-standard Web ARChive file format. Warclight allows users to explore web archives that have been indexed into Apache Solr using the UK Web Archive's Web Archive Discovery tool. Referencing previous work, we explain how the standard search engine results page is inadequate to support scholarly inquiries. Instead, Warclight provides full-text and faceted search, as well as faceted browsing, to enable exploration and discovery. Given the large sizes of many web archives, we share experiences with deploying our tool at scale using a federated architecture.

[1]  Jimmy Lin,et al.  Building Community and Tools for Analyzing Web Archives Through Datathons , 2019, 2019 ACM/IEEE Joint Conference on Digital Libraries (JCDL).

[2]  Jimmy J. Lin,et al.  Desiderata for exploratory search interfaces to Web archives in support of scholarly activities , 2016, 2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL).

[3]  Ian Milligan,et al.  The Cost of a WARC: Analyzing Web Archives in the Cloud , 2019, 2019 ACM/IEEE Joint Conference on Digital Libraries (JCDL).

[4]  Elizabeth Sadler Project Blacklight: a next generation library catalog at a first generation university , 2009, Libr. Hi Tech.

[5]  Ben Shneiderman,et al.  The eyes have it: a task by data type taxonomy for information visualizations , 1996, Proceedings 1996 IEEE Symposium on Visual Languages.