INEX2014: Tweet Contextualization Using Association Rules between Terms

Tweets are short messages that do not exceed 140 characters. Since they must be written respecting this limitation, a particular vocab- ulary is used. To make them understandable to a reader, it is therefore necessary to know their context. In this paper, we describe our approach submitted for the tweet contextualization track in CLEF 2014 (Confer- ence and Labs of Evaluation Forums). This approach allows the extension of the tweet's vocabulary by a set of thematically related words using mining association rules between terms. Web 2.0 is the term associated with the transition of the World Wide Web from a collection of individual web sites to an emerging platform in its own right. This emergence is due largely to users collaborations, these users have been the driving force for the emergence of new services (1). One of those is the microblogging service, e.g., Twitter, which is a communication medium and a collaboration system that allows broadcasting short messages. In contrast to traditional blogs, media-sharing and social networks services, microblogs (tweets) are textual messages submitted in real-time to report an idea, an actual interest, or an opinion (2). The size of these messages may be limited by a maximum number of characters. This constraint, related to the size of message, causes the use of a particular vocabulary. The aim is to exchange a maximum of information in as little characters as possible (3). In this respect, we will focus on the Tweet Contextualization track. The participants of INEX 2014 1 are required to perform the task of contextualizing tweets, i.e., given a tweet and a related entity, they try to answer questions of the form "why this tweet concerns this entity? should it be an alert?".These questions can be answered by several sentences or by an aggregation of texts from dierent articles of Wikipedia.