论文信息 - Query representation for cross-temporal information retrieval

Query representation for cross-temporal information retrieval

This paper addresses the problem of long-term language change in information retrieval (IR) systems. IR research has often ignored lexical drift. But in the emerging domain of massive digitized book collections, the risk of vocabulary mismatch due to language change is high. Collections such as Google Books and the Hathi Trust contain text written in the vernaculars of many centuries. With respect to IR, changes in vocabulary and orthography make 14th-Century English qualitatively different from 21st-Century English. This challenges retrieval models that rely on keyword matching. With this challenge in mind, we ask: given a query written in contemporary English, how can we retrieve relevant documents that were written in early English? We argue that search in historically diverse corpora is similar to cross-language retrieval (CLIR). By considering "modern" English and "archaic" English as distinct languages, CLIR techniques can improve what we call cross-temporal IR (CTIR). We focus on ways to combine evidence to improve CTIR effectiveness, proposing and testing several ways to handle language change during book search. We find that a principled combination of three sources of evidence during relevance feedback yields strong CTIR performance.

Miles Efron | M. Efron | Miles Efron

[1] Gabriella Kazai,et al. Overview of the INEX 2014 Social Book Search Track , 2014, CLEF.

[2] Gabriella Kazai,et al. Social book search: comparing topical relevance judgements and book suggestions for evaluation , 2012, CIKM.

[3] Matjaz Perc,et al. Evolution of the most common English words and phrases over the centuries , 2012, Journal of The Royal Society Interface.

[4] Carsten Eickhoff,et al. Report on BooksOnline'11: 4th workshop on online books, complementary social media, and crowdsourcing , 2012, SIGF.

[5] Matthew Lease,et al. Supervised language modeling for temporal resolution of texts , 2011, CIKM '11.

[6] Miles Efron,et al. Information search and retrieval in microblogs , 2011, J. Assoc. Inf. Sci. Technol..

[7] Daqing He,et al. Enhancing query translation with relevance feedback in translingual information retrieval , 2011, Inf. Process. Manag..

[8] ChengXiang Zhai,et al. Estimation of statistical translation models based on mutual information for ad hoc information retrieval , 2010, SIGIR.

[9] Dan Cohen. Is Google Good for History , 2010 .

[10] Kjetil Nørvåg,et al. Improving Temporal Language Models for Determining Time of Non-timestamped Documents , 2008, ECDL.

[11] Gabriella Kazai,et al. Overview of the INEX 2007 Book Search track: BookSearch '07 , 2008, SIGF.