Towards an Expressive and Scalable Twitter's Users Profiles

Microblogging websites such as Twitter produce tremendous amount of data each second. Consequently, real-time recommendation systems require very efficient algorithm to quickly proceed this massive amount of data. In this paper we propose a scalable and extensible way of building content-based user profiles. Scalability refers to the relative complexity of algorithms involved in building the users profiles with respect to state-of-the-art solutions. Extensibility consider avoiding to recompute the model for newcomers. We present a tractable algorithm to build user profiles out of their tweets. Our model is a graph of terms cooccurency, driven by the fact that user sharing similar interests will share similar terms. We then present how this model can be encoded as a binary footprint, hence boosting comparison of users. We provide an empirical study to measure how the distance between users in the hash space differs from distance between users using standard Information Retrieval techniques. This experiment is based on a Twitter dataset we crawled, and represents 25K users and 1 million tweets. Our approach is driven by real-time analysis requirements and is thus oriented on a trade-off between expressivity and efficiency. Experimental results shows that our approach outperforms vector space model by three orders of magnitude, with a precision of 58%.

[1]  Andrew McCallum,et al.  Information extraction from research papers using conditional random fields , 2006, Inf. Process. Manag..

[2]  James Bennett,et al.  The Netflix Prize , 2007 .

[3]  Gerard Salton,et al.  Research and Development in Information Retrieval , 1982, Lecture Notes in Computer Science.

[4]  Herman J. ter Horst,et al.  Completeness, decidability and complexity of entailment for RDF Schema and a semantic extension involving the OWL vocabulary , 2005, J. Web Semant..

[5]  Raymond J. Mooney,et al.  Content-boosted collaborative filtering for improved recommendations , 2002, AAAI/IAAI.

[6]  Iryna Gurevych,et al.  Approximate Matching for Evaluating Keyphrase Extraction , 2009, RANLP.

[7]  Aliza Sarlan,et al.  Twitter sentiment analysis , 2014, Proceedings of the 6th International Conference on Information Technology and Multimedia.

[8]  Pasquale Lops,et al.  Content-based Recommender Systems: State of the Art and Trends , 2011, Recommender Systems Handbook.

[9]  Patrick Seemann,et al.  Matrix Factorization Techniques for Recommender Systems , 2014 .

[10]  Hanna M. Wallach,et al.  Topic modeling: beyond bag-of-words , 2006, ICML.

[11]  Yehuda Koren,et al.  Matrix Factorization Techniques for Recommender Systems , 2009, Computer.

[12]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[13]  Dan Frankowski,et al.  Collaborative Filtering Recommender Systems , 2007, The Adaptive Web.

[14]  Pierre Maret,et al.  Semantic User Interaction Profiles for Better People Recommendation , 2011, 2011 International Conference on Advances in Social Networks Analysis and Mining.

[15]  Moses Charikar,et al.  Greedy approximation algorithms for finding dense components in a graph , 2000, APPROX.

[16]  Peter Mika,et al.  Flink: Semantic Web technology for the extraction and analysis of social networks , 2005, J. Web Semant..

[17]  Yang Song,et al.  Real-time automatic tag recommendation , 2008, SIGIR '08.

[18]  Jonathan L. Herlocker,et al.  Evaluating collaborative filtering recommender systems , 2004, TOIS.

[19]  S. Floyd,et al.  Adaptive Web , 1997 .

[20]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[21]  Barry Smyth,et al.  Using twitter to recommend real-time topical news , 2009, RecSys '09.

[22]  Patrick Paroubek,et al.  Twitter as a Corpus for Sentiment Analysis and Opinion Mining , 2010, LREC.

[23]  John Hannon,et al.  Recommending twitter users to follow using content and collaborative filtering approaches , 2010, RecSys '10.

[24]  Timothy W. Finin,et al.  Why we twitter: understanding microblogging usage and communities , 2007, WebKDD/SNA-KDD '07.

[25]  Pabitra Mitra,et al.  Feature weighting in content based recommendation system using social network analysis , 2008, WWW.

[26]  Patrick Pantel,et al.  From Frequency to Meaning: Vector Space Models of Semantics , 2010, J. Artif. Intell. Res..

[27]  Christian Bizer,et al.  Executing SPARQL Queries over the Web of Linked Data , 2009, SEMWEB.

[28]  Lars Schmidt-Thieme,et al.  Proceedings of the third ACM conference on Recommender systems , 2008, RecSys 2008.

[29]  F. Rudzicz Human Language Technologies : The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics , 2010 .

[30]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[31]  Nigel Shadbolt,et al.  A Study of User Profile Generation from Folksonomies , 2008, SWKM.

[32]  Gediminas Adomavicius,et al.  Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions , 2005, IEEE Transactions on Knowledge and Data Engineering.

[33]  Eva Zangerle,et al.  Using Tag Recommendations to Homogenize Folksonomies in Microblogging Environments , 2011, SocInfo.

[34]  Yehuda Koren,et al.  Lessons from the Netflix prize challenge , 2007, SKDD.

[35]  Alfred Kobsa,et al.  The Adaptive Web, Methods and Strategies of Web Personalization , 2007, The Adaptive Web.

[36]  Mitsuru Ishizuka,et al.  Keyword extraction from a single document using word co-occurrence statistical information , 2004, Int. J. Artif. Intell. Tools.

[37]  Hector Garcia-Molina,et al.  Web graph similarity for anomaly detection , 2010, Journal of Internet Services and Applications.

[38]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[39]  Ben Y. Zhao,et al.  User interactions in social networks and their implications , 2009, EuroSys '09.

[40]  Marco Pennacchiotti,et al.  Investigating topic models for social media user recommendation , 2011, WWW.

[41]  Wei Wu,et al.  Automatic Generation of Personalized Annotation Tags for Twitter Users , 2010, NAACL.

[42]  Jorge Pérez,et al.  Minimal Deductive Systems for RDF , 2007, ESWC.