Web archives are large longitudinal collections that store webpages from the past, which might be missing on the current live Web. Consequently, temporal search over such collections is essential for finding prominent missing webpages and tasks like historical analysis. However, this has been challenging due to the lack of popularity information and proper ground truth to evaluate temporal retrieval models. In this paper we investigate the applicability of external longitudinal resources to identify important and popular websites in the past and analyze the social bookmarking service Delicious for this purpose. The timestamped bookmarks on Delicious provide explicit cues about popular time periods in the past along with relevant descriptors. These are valuable to identify important documents in the past for a given temporal query. Focusing purely on recall, we analyzed more than 12,000 queries and find that using Delicious yields average recall values from 46% up to 100%, when limiting ourselves to the best represented queries in the considered dataset. This constitutes an attractive and low-overhead approach for quick access into Web archives by not dealing with the actual contents.
[1]
Avishek Anand,et al.
Tempas: Temporal Archive Search Based on Tags
,
2016,
WWW.
[2]
Arkaitz Zubiaga,et al.
Harnessing Folksonomies to Produce a Social Classification of Resources
,
2013,
IEEE Transactions on Knowledge and Data Engineering.
[3]
Claudia Niederée,et al.
A Time-aware Random Walk Model for Finding Important Documents in Web Archives
,
2015,
SIGIR.
[4]
Srikanta J. Bedathur,et al.
Index maintenance for time-travel text search
,
2012,
SIGIR '12.
[5]
Ricardo Campos,et al.
Survey of Temporal Information Retrieval and Related Applications
,
2014,
ACM Comput. Surv..
[6]
Georgia Koutrika,et al.
Can social bookmarking improve web search?
,
2008,
WSDM '08.
[7]
Srikanta J. Bedathur,et al.
Temporal index sharding for space-time efficiency in archive search
,
2011,
SIGIR.
[8]
Wolfgang Nejdl,et al.
Can all tags be used for search?
,
2008,
CIKM '08.
[9]
Gerhard Weikum,et al.
A Language Modeling Approach for Temporal Information Needs
,
2010,
ECIR.