Tweet Contextualization Based on Wikipedia and Dbpedia

Bound to 140 characters, tweets are short and not written maintaining formal grammar and proper spelling. These spelling variations increase the likelihood of vocabulary mismatch and make them difficult to understand without context. This paper falls under the tweet contextualization task that aims at providing, automatically, a summary that explains a given tweet, allowing a reader to understand it. We propose different tweet expansion approaches based on Wikipeda and Dbpedia as external knowledge sources. These proposed approaches are divided into two steps. The first step consists in generating the candidate terms for a given tweet, while the second one consists in ranking and selecting these candidate terms using a similarity measure. The effectiveness of our methods is proved through an experimental study conducted on the INEX 2014 collection.

[1]  Cherif Chiraz Latiri,et al.  INEX2014: Tweet Contextualization Using Association Rules between Terms , 2014, CLEF.

[2]  Jens Lehmann,et al.  Keyword Query Expansion on Linked Data Using Linguistic and Semantic Features , 2013, 2013 IEEE Seventh International Conference on Semantic Computing.

[3]  Mohammed J. Zaki,et al.  CHARM: An Efficient Algorithm for Closed Itemset Mining , 2002, SDM.

[4]  Juan-Manuel Torres-Moreno Three Statistical Summarizers at CLEF-INEX 2013 Tweet Contextualization Track , 2014, CLEF.

[5]  Ian H. Witten,et al.  An open-source toolkit for mining Wikipedia , 2013, Artif. Intell..

[6]  Engelbert Mephu Nguifo,et al.  Approches d'extraction de règles d'association basées sur la correspondance de Galois , 2004, Ingénierie des Systèmes d Inf..

[7]  Mohamed Morchid,et al.  Combinaison de thèmes latents pour la contextualisation de Tweets , 2013 .

[8]  Prasenjit Majumder,et al.  Query Expansion for Microblog Retrieval , 2011, TREC.

[9]  Evgeniy Gabrilovich,et al.  Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis , 2007, IJCAI.

[10]  M. de Rijke,et al.  Adding semantics to microblog posts , 2012, WSDM '12.

[11]  Andréa Carneiro Linhares An Automatic Greedy Summarization System at INEX 2013 Tweet Contextualization Track , 2013, CLEF.

[12]  Min Song,et al.  Integration of association rules and ontologies for semantic query expansion , 2007, Data Knowl. Eng..

[13]  Yuefeng Li,et al.  Microblog Retrieval Using Topical Features and Query Expansion , 2011, TREC.

[14]  Philippe Mulhem,et al.  Multimedia Information Modeling and Retrieval (MRIM) /Laboratoire d'Informatique de Grenoble (LIG) at CHiC2013 , 2013, CLEF.

[15]  Tian Xia,et al.  An improvement to TF-IDF: Term Distribution based Term Weight Algorithm , 2011, J. Softw..

[16]  Josiane Mothe,et al.  IRIT at INEX 2012: Tweet Contextualization , 2012, CLEF.

[17]  Florian Boudin,et al.  Contextualisation automatique de Tweets à partir de Wikipédia , 2013, CORIA.

[18]  Florian Boudin,et al.  Effective Tweet Contextualization with Hashtags Performance Prediction and Multi-Document Summarization , 2013, CLEF.

[19]  Ali Jaoua,et al.  Query expansion using fuzzy association rules between terms , 2003 .