论文信息 - LIA/LINA at the INEX 2012 Tweet Contextualization track

LIA/LINA at the INEX 2012 Tweet Contextualization track

In this paper we describe our participation in the INEX 2012 Tweet Contextualization track and present our contributions. We com- bined Information Retrieval, Automatic Summarization and Topic Mod- eling techniques to provide the context of each tweet. We rst formulate a specic query using hashtags and important words in the Tweets to retrieve the most relevant Wikipedia articles. Then, we segment the ar- ticles into sentences and compute several measures for each sentence, in order to estimate their contextual relevance to the topics expressed by the Tweets. Finally, the best scored sentences are used to form the context. Ocial results suggest that our methods performed very well compared to other participants.

Florian Boudin | Romain Deveaud

[1] Charles L. A. Clarke,et al. Efficient and effective spam filtering and re-ranking for large web datasets , 2010, Information Retrieval.

[2] W. Bruce Croft,et al. A Markov random field model for term dependencies , 2005, SIGIR '05.

[3] Antony J. Williams,et al. Beautiful Data: The Stories Behind Elegant Data Solutions , 2009 .

[4] Rada Mihalcea,et al. TextRank: Bringing Order into Text , 2004, EMNLP.

[5] Michael I. Jordan,et al. Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..