Harnessing linked knowledge sources for topic classification in social media

Topic classification (TC) of short text messages offers an effective and fast way to reveal events happening around the world ranging from those related to Disaster (e.g. Sandy hurricane) to those related to Violence (e.g. Egypt revolution). Previous approaches to TC have mostly focused on exploiting individual knowledge sources (KS) (e.g. DBpedia or Freebase) without considering the graph structures that surround concepts present in KSs when detecting the topics of Tweets. In this paper we introduce a novel approach for harnessing such graph structures from multiple linked KSs, by: (i) building a conceptual representation of the KSs, (ii) leveraging contextual information about concepts by exploiting semantic concept graphs, and (iii) providing a principled way for the combination of KSs. Experiments evaluating our TC classifier in the context of Violence detection (VD) and Emergency Responses (ER) show promising results that significantly outperform various baseline models including an approach using a single KS without linked data and an approach using only Tweets.

[1]  Haixun Wang,et al.  Short Text Conceptualization Using a Probabilistic Knowledgebase , 2011, IJCAI.

[2]  Ian H. Witten,et al.  Learning to link with wikipedia , 2008, CIKM '08.

[3]  Jeffrey V. Nickerson,et al.  Discovering Context: Classifying Tweets through a Semantic Transform Based on Wikipedia , 2011, HCI.

[4]  Timothy Baldwin,et al.  Lexical Normalisation of Short Text Messages: Makn Sens a #twitter , 2011, ACL.

[5]  Jens Lehmann,et al.  DBpedia - A crystallization point for the Web of Data , 2009, J. Web Semant..

[6]  Ziqi Zhang,et al.  Harnessing different knowledge sources to measure semantic relatedness under a uniform model , 2011, EMNLP.

[7]  Paolo Ferragina,et al.  Classification of Short Texts by Deploying Topical Annotations , 2012, ECIR.

[8]  Óscar Corcho,et al.  Associating Semantics to Multilingual Tags in Folksonomies , 2010, EKAW.

[9]  Fabio Ciravegna,et al.  Exploring the Similarity between Social Knowledge Sources and Twitter for Cross-domain Topic Classification , 2012, KECSM@ISWC.

[10]  Oren Etzioni,et al.  Named Entity Recognition in Tweets: An Experimental Study , 2011, EMNLP.

[11]  W. Marsden I and J , 2012 .

[12]  Qi Gao,et al.  Analyzing user modeling on twitter for personalized news recommendations , 2011, UMAP'11.

[13]  Oscar Corcho,et al.  Identifying Topics in Social Media Posts using DBpedia , 2011 .

[14]  Praveen Paritosh,et al.  Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[15]  D. Oard,et al.  Wikipedia-based topic clustering for microblogs , 2011, ASIST.

[16]  Matthew Michelson,et al.  Tweet Disambiguate Entities Retrieve Folksonomy SubTree Step 1 : Discover Categories Generate Topic Profile from SubTrees Step 2 : Discover Profile Topic Profile : “ English Football ” “ World Cup ” , 2010 .

[17]  Gerhard Weikum,et al.  YAGO2: A Spatially and Temporally Enhanced Knowledge Base from Wikipedia: Extended Abstract , 2013, IJCAI.

[18]  Raphaël Troncy,et al.  POLITECNICO DI TORINO Repository ISTITUZIONALE NERD : A Framework for Evaluating Named Entity Recognition Tools in the Web of Data / , 2022 .

[19]  Susumu Horiguchi,et al.  Learning to classify short and sparse text & web with hidden topics from large-scale data collections , 2008, WWW.

[20]  Edith Schonberg,et al.  Extracting Enterprise Vocabularies Using Linked Open Data , 2009, International Semantic Web Conference.