Tweet Contextualization Approach Based on Wikipedia and Dbpedia

Bound to 140 characters, tweets are short and not written maintaining formal grammar and proper spelling. These spelling variations increase the likelihood of vocabulary mismatch and make them difficult to understand without context. This paper falls under the tweet contextualization task that aims at providing, automatically, a summary that explains a given tweet, allowing a reader to understand it. We propose different tweet expansion approaches based on Wikipeda and Dbpedia as external knowledge sources. These proposed approaches are divided into two steps. The first step consists in generating the candidate terms for a given tweet, while the second one consists in ranking and selecting these candidate terms using a similarity measure. The effectiveness of our methods is proved through an experimental study conducted on the INEX 2014 collection. RÉSUMÉ. La taille des tweets est limitée à un nombre maximum de caractères. Cette contrainte liée à la taille du message entraîne l’utilisation d’un vocabulaire particulier rendant le tweet difficile à comprendre. La tâche de contextualisation des tweets vise à fournir, automatiquement, un résumé qui explique un tweet donné, ce qui permet au lecteur de bien le comprendre. Nous proposons pour cela différentes méthodes basées sur deux énormes sources de connaissances à savoir, Wikipédia et Dbpedia. L’efficacité de notre méthode est prouvée par une étude expérimentale menée sur la collection d’INEX 2014.

[1]  Florian Boudin,et al.  Effective Tweet Contextualization with Hashtags Performance Prediction and Multi-Document Summarization , 2013, CLEF.

[2]  Josiane Mothe,et al.  IRIT at INEX 2012: Tweet Contextualization , 2012, CLEF.

[3]  Prasenjit Majumder,et al.  Query Expansion for Microblog Retrieval , 2011, TREC.

[4]  Mohamed Morchid,et al.  Combinaison de thèmes latents pour la contextualisation de Tweets , 2013 .

[5]  Jens Lehmann,et al.  Keyword Query Expansion on Linked Data Using Linguistic and Semantic Features , 2013, 2013 IEEE Seventh International Conference on Semantic Computing.

[6]  Yuefeng Li,et al.  Microblog Retrieval Using Topical Features and Query Expansion , 2011, TREC.

[7]  Florian Boudin,et al.  Contextualisation automatique de Tweets à partir de Wikipédia , 2013, CORIA.

[8]  Engelbert Mephu Nguifo,et al.  Approches d'extraction de règles d'association basées sur la correspondance de Galois , 2004, Ingénierie des Systèmes d Inf..

[9]  Philippe Mulhem,et al.  Multimedia Information Modeling and Retrieval (MRIM) /Laboratoire d'Informatique de Grenoble (LIG) at CHiC2013 , 2013, CLEF.

[10]  Andréa Carneiro Linhares An Automatic Greedy Summarization System at INEX 2013 Tweet Contextualization Track , 2013, CLEF.

[11]  Cherif Chiraz Latiri,et al.  INEX2014: Tweet Contextualization Using Association Rules between Terms , 2014, CLEF.

[12]  Juan-Manuel Torres-Moreno Three Statistical Summarizers at CLEF-INEX 2013 Tweet Contextualization Track , 2014, CLEF.

[13]  Tian Xia,et al.  An improvement to TF-IDF: Term Distribution based Term Weight Algorithm , 2011, J. Softw..

[14]  Min Song,et al.  Integration of association rules and ontologies for semantic query expansion , 2007, Data Knowl. Eng..

[15]  M. de Rijke,et al.  Adding semantics to microblog posts , 2012, WSDM '12.

[16]  Ian H. Witten,et al.  An open-source toolkit for mining Wikipedia , 2013, Artif. Intell..

[17]  Ali Jaoua,et al.  Query expansion using fuzzy association rules between terms , 2003 .

[18]  Evgeniy Gabrilovich,et al.  Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis , 2007, IJCAI.

[19]  Mohammed J. Zaki,et al.  CHARM: An Efficient Algorithm for Closed Itemset Mining , 2002, SDM.