Automatic acquisition of a taxonomy of microblogs users' interests

Abstract Modeling users’ interests plays an important role in the current web since it is at the basis of many services such as recommendation and customization. Using semantic technologies to represent users’ interests may help to reduce problems such as sparsity, over-specialization and domain-dependency, which are known to be critical issues of state of the art recommenders. In this paper we present a method for high-coverage modeling of Twitter users supported by a hierarchical representation of their interests, which we call a Twixonomy. In order to automatically build a population, community, or single-user Twixonomy we first identify “topical” friends in users’ friendship lists (i.e., friends representing an interest rather than a social relation between peers). We classify as topical those users with an associated page on Wikipedia. A word-sense disambiguation algorithm is used to select the appropriate Wikipedia page for each topical friend. Next, starting from the set of wikipages representing the main topics of interests of the considered Twitter population, we extract all paths connecting these pages with topmost Wikipedia category nodes, and we then prune the resulting graph efficiently so as to induce a direct acyclic graph and significantly reduce over ambiguity, a well known problem of the Wikipedia category graph. We release the Twixonomy produced in this work under creative common license.

[1]  Kevin Knight,et al.  Toward Distributed Use of Large-Scale Ontologies t , 1997 .

[2]  Nicola Barbieri,et al.  Who to follow and why: link prediction with explanations , 2014, KDD.

[3]  Dunja Mladenic,et al.  Golden Standard Based Ontology Evaluation Using Instance Assignment , 2006, EON@WWW.

[4]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[5]  Alice Oh,et al.  Analysis of Twitter Lists as a Potential Source for Discovering Latent Characteristics of Users , 2010 .

[6]  Roberto Navigli,et al.  Large-Scale Information Extraction from Textual Definitions through Deep Syntactic and Semantic Analysis , 2015, TACL.

[7]  Wendy Liu,et al.  Homophily and Latent Attribute Inference: Inferring Latent Attributes of Twitter Users from Neighbors , 2012, ICWSM.

[8]  Robert E. Tarjan,et al.  Depth-First Search and Linear Graph Algorithms , 1972, SIAM J. Comput..

[9]  Markus Zanker The influence of knowledgeable explanations on users' perception of a recommender system , 2012, RecSys '12.

[10]  Paolo Ferragina,et al.  TAGME: on-the-fly annotation of short text fragments (by wikipedia entities) , 2010, CIKM.

[11]  Roberto Navigli,et al.  Entity Linking meets Word Sense Disambiguation: a Unified Approach , 2014, TACL.

[12]  Ee-Peng Lim,et al.  A Survey of Recommender Systems in Twitter , 2012, SocInfo.

[13]  Alan Eckhardt,et al.  Inductive Models of User Preferences for Semantic Web , 2007, DATESO.

[14]  Stefano Faralli,et al.  A New Method for Evaluating Automatically Learned Terminological Taxonomies , 2012, LREC.

[15]  Jure Leskovec,et al.  The bursty dynamics of the Twitter information network , 2014, WWW.

[16]  Els Lefever,et al.  LT3: A Multi-modular Approach to Automatic Taxonomy Construction , 2015, *SEMEVAL.

[17]  Reza Zafarani,et al.  Am i more similar to my followers or followees?: analyzing homophily effect in directed social networks , 2014, HT.

[18]  H. Anderson Fire spread and flame shape , 1968 .

[19]  P. Pirolli,et al.  It's Not in Their Tweets: Modeling Topical Expertise of Twitter Users , 2012, 2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Confernece on Social Computing.

[20]  Paul Buitelaar,et al.  SemEval-2015 Task 17: Taxonomy Extraction Evaluation (TExEval) , 2015, SemEval@NAACL-HLT.

[21]  Simone Paolo Ponzetto,et al.  BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network , 2012, Artif. Intell..

[22]  Hui Xiong,et al.  Enhancing recommender systems under volatile userinterest drifts , 2009, CIKM.

[23]  Gregory Grefenstette,et al.  INRIASAC: Simple Hypernym Extraction Methods , 2015, *SEMEVAL.

[24]  Tiziano Flati,et al.  Two Is Bigger (and Better) Than One: the Wikipedia Bitaxonomy Project , 2014, ACL.

[25]  Amit P. Sheth,et al.  User Interests Identification on Twitter Using a Hierarchical Knowledge Base , 2014, ESWC.

[26]  Stefano Faralli,et al.  OntoLearn Reloaded: A Graph-Based Algorithm for Taxonomy Induction , 2013, CL.

[27]  Scott Counts,et al.  Identifying topical authorities in microblogs , 2011, WSDM '11.

[28]  Jia Wang,et al.  User comments for news recommendation in forum-based social media , 2010, Inf. Sci..

[29]  Krishna P. Gummadi,et al.  Inferring user interests in the Twitter social network , 2014, RecSys '14.

[30]  Nick Koudas,et al.  TwitterMonitor: trend detection over the twitter stream , 2010, SIGMOD Conference.

[31]  A. Arvidsson,et al.  Echo Chamber or Public Sphere? Predicting Political Orientation and Measuring Political Homophily in Twitter Using Big Data , 2014 .

[32]  Raphael Volz,et al.  The Ontology Extraction & Maintenance Framework Text-To-Onto , 2001 .

[33]  Stefano Faralli,et al.  Large Scale Homophily Analysis in Twitter Using a Twixonomy , 2015, IJCAI.

[34]  Patrick Siehndel,et al.  TwikiMe! - User Profiles That Make Sense , 2012, International Semantic Web Conference.

[35]  Wlodzislaw Duch,et al.  Multiple Inheritance Problem in Semantic Spreading Activation Networks , 2014, Brain Informatics and Health.