Real-time, scalable, content-based Twitter users recommendation

Real-time recommendation of Twitter users based on the content of their profiles is a very challenging task. Traditional IR methods such as TF-IDF fail to handle efficiently large datasets. In this paper we present a scalable approach that allows real time recommendation of users based on their tweets. Our model builds a graph of terms, driven by the fact that users sharing similar interests will share similar terms. We show how this model can be encoded as a compact binary footprint, that allows very fast comparison and ranking, taking full advantage of modern CPU architectures. We validate our approach through an empirical evaluation against the Apache Lucene's implementation of TF-IDF. We show that our approach is in average two hundred times faster than standard optimized implementation of TF-IDF with a precision of 58 %.

[1]  Chih-Ming Chen,et al.  Incremental personalized Web page mining utilizing self-organizing HCMAC neural network , 2003, Proceedings IEEE/WIC International Conference on Web Intelligence (WI 2003).

[2]  John Hannon,et al.  Recommending twitter users to follow using content and collaborative filtering approaches , 2010, RecSys '10.

[3]  Ben Y. Zhao,et al.  User interactions in social networks and their implications , 2009, EuroSys '09.

[4]  Bart Selman,et al.  Referral Web: combining social networks and collaborative filtering , 1997, CACM.

[5]  Nigel Shadbolt,et al.  A Study of User Profile Generation from Folksonomies , 2008, SWKM.

[6]  Gediminas Adomavicius,et al.  Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions , 2005, IEEE Transactions on Knowledge and Data Engineering.

[7]  Eva Zangerle,et al.  Using Tag Recommendations to Homogenize Folksonomies in Microblogging Environments , 2011, SocInfo.

[8]  Yehuda Koren,et al.  Lessons from the Netflix prize challenge , 2007, SKDD.

[9]  Pablo Gervás,et al.  Personalisation in news delivery systems: Item summarization and multi-tier item selection using relevance feedback , 2005, Web Intell. Agent Syst..

[10]  Hector Garcia-Molina,et al.  Web graph similarity for anomaly detection , 2010, Journal of Internet Services and Applications.

[11]  Patrick Pantel,et al.  From Frequency to Meaning: Vector Space Models of Semantics , 2010, J. Artif. Intell. Res..

[12]  Christian Bizer,et al.  Executing SPARQL Queries over the Web of Linked Data , 2009, SEMWEB.

[13]  Dominik Endres,et al.  A new metric for probability distributions , 2003, IEEE Transactions on Information Theory.

[14]  Mitsuru Ishizuka,et al.  Keyword extraction from a single document using word co-occurrence statistical information , 2004, Int. J. Artif. Intell. Tools.

[15]  Andrew McCallum,et al.  Information extraction from research papers using conditional random fields , 2006, Inf. Process. Manag..

[16]  Nicola Santoro,et al.  Min-max heaps and generalized priority queues , 1986, CACM.

[17]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[18]  Barry Smyth,et al.  Using twitter to recommend real-time topical news , 2009, RecSys '09.

[19]  Patrick Paroubek,et al.  Twitter as a Corpus for Sentiment Analysis and Opinion Mining , 2010, LREC.

[20]  Pierre Maret,et al.  Semantic User Interaction Profiles for Better People Recommendation , 2011, 2011 International Conference on Advances in Social Networks Analysis and Mining.

[21]  Moses Charikar,et al.  Greedy approximation algorithms for finding dense components in a graph , 2000, APPROX.

[22]  Pawan Lingras,et al.  Interval set clustering of web users using modified Kohonen self-organizing maps based on the properties of rough sets , 2004, Web Intell. Agent Syst..

[23]  Peter Mika,et al.  Flink: Semantic Web technology for the extraction and analysis of social networks , 2005, J. Web Semant..

[24]  Jonathan L. Herlocker,et al.  Evaluating collaborative filtering recommender systems , 2004, TOIS.

[25]  Yang Song,et al.  Real-time automatic tag recommendation , 2008, SIGIR '08.

[26]  Konstantinos G. Margaritis,et al.  On the enhancement of collaborative filtering by demographic data , 2006, Web Intell. Agent Syst..

[27]  Herman J. ter Horst,et al.  Completeness, decidability and complexity of entailment for RDF Schema and a semantic extension involving the OWL vocabulary , 2005, J. Web Semant..

[28]  Raymond J. Mooney,et al.  Content-boosted collaborative filtering for improved recommendations , 2002, AAAI/IAAI.

[29]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[30]  Marco Pennacchiotti,et al.  Investigating topic models for social media user recommendation , 2011, WWW.

[31]  Wei Wu,et al.  Automatic Generation of Personalized Annotation Tags for Twitter Users , 2010, NAACL.

[32]  Pabitra Mitra,et al.  Feature weighting in content based recommendation system using social network analysis , 2008, WWW.

[33]  Pasquale Lops,et al.  Content-based Recommender Systems: State of the Art and Trends , 2011, Recommender Systems Handbook.

[34]  Iryna Gurevych,et al.  Approximate Matching for Evaluating Keyphrase Extraction , 2009, RANLP.

[35]  Hanna M. Wallach,et al.  Topic modeling: beyond bag-of-words , 2006, ICML.

[36]  Timothy W. Finin,et al.  Why we twitter: understanding microblogging usage and communities , 2007, WebKDD/SNA-KDD '07.

[37]  Jorge Pérez,et al.  Minimal Deductive Systems for RDF , 2007, ESWC.

[38]  Richi Nayak,et al.  Users segmentations for recommendation , 2013, SAC '13.

[39]  Yehuda Koren,et al.  Matrix Factorization Techniques for Recommender Systems , 2009, Computer.

[40]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[41]  Dan Frankowski,et al.  Collaborative Filtering Recommender Systems , 2007, The Adaptive Web.