Real-time, Scalable, Content-based Twitter Users Recommendation

Real-time recommendation of Twitter users based on the content of their profiles is a very challenging task. Traditional IR methods such as TF-IDF fail to handle efficiently large datasets. In this paper we present a scalable approach that allows real time recommendation of users based on their tweets. Our model builds a graph of terms, driven by the fact that users sharing similar interests will share similar terms. We show how this model can be encoded as a compact binary footprint, that allows very fast comparison and ranking, taking full advantage of modern CPU architectures. We validate our approach through an empirical evaluation against the Apache Lucene's implementation of TF-IDF. We show that our approach is in average two hundred times faster than standard optimised implementation of TF-IDF with a precision of 58%. The work presented here has been published in The Web Intelligence Journal.

[1]  Timothy W. Finin,et al.  Why we twitter: understanding microblogging usage and communities , 2007, WebKDD/SNA-KDD '07.

[2]  Pablo Gervás,et al.  Personalisation in news delivery systems: Item summarization and multi-tier item selection using relevance feedback , 2005, Web Intell. Agent Syst..

[3]  Hector Garcia-Molina,et al.  Web graph similarity for anomaly detection , 2010, Journal of Internet Services and Applications.

[4]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[5]  Ben Y. Zhao,et al.  User interactions in social networks and their implications , 2009, EuroSys '09.

[6]  Mitsuru Ishizuka,et al.  Keyword extraction from a single document using word co-occurrence statistical information , 2004, Int. J. Artif. Intell. Tools.

[7]  Patrick Paroubek,et al.  Twitter as a Corpus for Sentiment Analysis and Opinion Mining , 2010, LREC.

[8]  Nicola Santoro,et al.  Min-max heaps and generalized priority queues , 1986, CACM.

[9]  Andrew McCallum,et al.  Information extraction from research papers using conditional random fields , 2006, Inf. Process. Manag..

[10]  Aliza Sarlan,et al.  Twitter sentiment analysis , 2014, Proceedings of the 6th International Conference on Information Technology and Multimedia.

[11]  Patrick Seemann,et al.  Matrix Factorization Techniques for Recommender Systems , 2014 .

[12]  Ulrich Drepper,et al.  What Every Programmer Should Know About Memory , 2007 .

[13]  Pabitra Mitra,et al.  Feature weighting in content based recommendation system using social network analysis , 2008, WWW.

[14]  Jorge Pérez,et al.  Minimal Deductive Systems for RDF , 2007, ESWC.

[15]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[16]  Yehuda Koren,et al.  Matrix Factorization Techniques for Recommender Systems , 2009, Computer.

[17]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[18]  Richi Nayak,et al.  Users segmentations for recommendation , 2013, SAC '13.

[19]  Dan Frankowski,et al.  Collaborative Filtering Recommender Systems , 2007, The Adaptive Web.

[20]  Peter Mika,et al.  Flink: Semantic Web technology for the extraction and analysis of social networks , 2005, J. Web Semant..

[21]  Jonathan L. Herlocker,et al.  Evaluating collaborative filtering recommender systems , 2004, TOIS.

[22]  Julien Subercaze,et al.  Real-time, scalable, content-based Twitter users recommendation , 2016, Web Intell..

[23]  James Bennett,et al.  The Netflix Prize , 2007 .

[24]  Herman J. ter Horst,et al.  Completeness, decidability and complexity of entailment for RDF Schema and a semantic extension involving the OWL vocabulary , 2005, J. Web Semant..

[25]  Raymond J. Mooney,et al.  Content-boosted collaborative filtering for improved recommendations , 2002, AAAI/IAAI.

[26]  Yang Song,et al.  Real-time automatic tag recommendation , 2008, SIGIR '08.

[27]  Konstantinos G. Margaritis,et al.  On the enhancement of collaborative filtering by demographic data , 2006, Web Intell. Agent Syst..

[28]  Marco Pennacchiotti,et al.  Investigating topic models for social media user recommendation , 2011, WWW.

[29]  Wei Wu,et al.  Automatic Generation of Personalized Annotation Tags for Twitter Users , 2010, NAACL.

[30]  Iryna Gurevych,et al.  Approximate Matching for Evaluating Keyphrase Extraction , 2009, RANLP.

[31]  Hanna M. Wallach,et al.  Topic modeling: beyond bag-of-words , 2006, ICML.

[32]  Pierre Maret,et al.  Semantic User Interaction Profiles for Better People Recommendation , 2011, 2011 International Conference on Advances in Social Networks Analysis and Mining.

[33]  Moses Charikar,et al.  Greedy approximation algorithms for finding dense components in a graph , 2000, APPROX.

[34]  Pawan Lingras,et al.  Interval set clustering of web users using modified Kohonen self-organizing maps based on the properties of rough sets , 2004, Web Intell. Agent Syst..

[35]  Dominik Endres,et al.  A new metric for probability distributions , 2003, IEEE Transactions on Information Theory.

[36]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[37]  Pasquale Lops,et al.  Content-based Recommender Systems: State of the Art and Trends , 2011, Recommender Systems Handbook.

[38]  Chih-Ming Chen,et al.  Incremental personalized Web page mining utilizing self-organizing HCMAC neural network , 2003, Proceedings IEEE/WIC International Conference on Web Intelligence (WI 2003).

[39]  John Hannon,et al.  Recommending twitter users to follow using content and collaborative filtering approaches , 2010, RecSys '10.

[40]  Nigel Shadbolt,et al.  A Study of User Profile Generation from Folksonomies , 2008, SWKM.

[41]  Gediminas Adomavicius,et al.  Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions , 2005, IEEE Transactions on Knowledge and Data Engineering.

[42]  Eva Zangerle,et al.  Using Tag Recommendations to Homogenize Folksonomies in Microblogging Environments , 2011, SocInfo.

[43]  Yehuda Koren,et al.  Lessons from the Netflix prize challenge , 2007, SKDD.

[44]  Barry Smyth,et al.  Using twitter to recommend real-time topical news , 2009, RecSys '09.

[45]  Patrick Pantel,et al.  From Frequency to Meaning: Vector Space Models of Semantics , 2010, J. Artif. Intell. Res..

[46]  Christian Bizer,et al.  Executing SPARQL Queries over the Web of Linked Data , 2009, SEMWEB.

[47]  Bart Selman,et al.  Referral Web: combining social networks and collaborative filtering , 1997, CACM.