SimClusters: Community-Based Representations for Heterogeneous Recommendations at Twitter

Personalized recommendation products at Twitter target a multitude of heterogeneous items: Tweets, Events, Topics, Hashtags, and users. Each of these targets varies in their cardinality (which affects the scale of the problem) and their "shelf life'' (which constrains the latency of generating the recommendations). Although Twitter has built a variety of recommendation systems before dating back a decade, solutions to the broader problem were mostly tackled piecemeal. In this paper, we present SimClusters, a general-purpose representation layer based on overlapping communities into which users as well as heterogeneous content can be captured as sparse, interpretable vectors to support a multitude of recommendation tasks. We propose a novel algorithm for community discovery based on Metropolis-Hastings sampling, which is both more accurate and significantly faster than off-the-shelf alternatives. SimClusters scales to networks with billions of users and has been effective across a variety of deployed applications at Twitter.

[1]  David Melamed,et al.  Community Structures in Bipartite Networks: A Dual-Projection Approach , 2014, PloS one.

[2]  Srinivasan Parthasarathy,et al.  Local graph sparsification for scalable clustering , 2011, SIGMOD '11.

[3]  Edoardo M. Airoldi,et al.  Mixed Membership Stochastic Blockmodels , 2007, NIPS.

[4]  Yizhou Sun,et al.  Personalized entity recommendation: a heterogeneous information network approach , 2014, WSDM.

[5]  Jure Leskovec,et al.  Detecting cohesive and 2-mode communities indirected and undirected networks , 2014, WSDM.

[6]  Ashish Goel,et al.  When Hashes Met Wedges: A Distributed Algorithm for Finding High Similarity Vectors , 2017, WWW.

[7]  Jure Leskovec,et al.  Overlapping community detection at scale: a nonnegative matrix factorization approach , 2013, WSDM.

[8]  Li Wei,et al.  Sampling-bias-corrected neural modeling for large corpus item recommendations , 2019, RecSys.

[9]  Graham Cormode,et al.  An improved data stream summary: the count-min sketch and its applications , 2004, J. Algorithms.

[10]  Jure Leskovec,et al.  Inductive Representation Learning on Large Graphs , 2017, NIPS.

[11]  Xu Chen,et al.  Joint Representation Learning for Top-N Recommendation with Heterogeneous Information Sources , 2017, CIKM.

[12]  Rok Sosic,et al.  SNAP , 2016, ACM Trans. Intell. Syst. Technol..

[13]  Yehuda Koren,et al.  Matrix Factorization Techniques for Recommender Systems , 2009, Computer.

[14]  Jimmy J. Lin,et al.  GraphJet: Real-Time Content Recommendations at Twitter , 2016, Proc. VLDB Endow..

[15]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[16]  Dong Wang,et al.  RealGraph: User Interaction Prediction at Twitter , 2014 .

[17]  Inderjit S. Dhillon,et al.  Weighted Graph Cuts without Eigenvectors A Multilevel Approach , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Jure Leskovec,et al.  Graph Convolutional Neural Networks for Web-Scale Recommender Systems , 2018, KDD.

[19]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[20]  Andrzej Cichocki,et al.  Fast Local Algorithms for Large Scale Nonnegative Matrix and Tensor Factorizations , 2009, IEICE Trans. Fundam. Electron. Commun. Comput. Sci..

[21]  Xiaodong He,et al.  A Multi-View Deep Learning Approach for Cross Domain User Modeling in Recommendation Systems , 2015, WWW.

[22]  Srinivasan Parthasarathy,et al.  Symmetrizations for clustering directed graphs , 2011, EDBT/ICDT '11.

[23]  Steffen Rendle,et al.  Factorization Machines , 2010, 2010 IEEE International Conference on Data Mining.

[24]  Matthew D. Hoffman,et al.  Variational Autoencoders for Collaborative Filtering , 2018, WWW.

[25]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[26]  Shlomo Moran,et al.  SALSA: the stochastic approach for link-structure analysis , 2001, TOIS.

[27]  Dietmar Jannach,et al.  Are we really making much progress? A worrying analysis of recent neural recommendation approaches , 2019, RecSys.

[28]  Jimmy J. Lin,et al.  WTF: the who to follow service at Twitter , 2013, WWW.

[29]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[30]  Paolo Cremonesi,et al.  Tutorial on cross-domain recommender systems , 2014, RecSys '14.

[31]  Paul Covington,et al.  Deep Neural Networks for YouTube Recommendations , 2016, RecSys.

[32]  Charalampos E. Tsourakakis Provably Fast Inference of Latent Features from Networks: with Applications to Learning Social Circles and Multilabel Classification , 2015, WWW.

[33]  Richard M. Karp,et al.  A simple algorithm for finding frequent elements in streams and bags , 2003, TODS.

[34]  Jimmy J. Lin,et al.  Real-Time Twitter Recommendation: Online Motif Detection in Large Dynamic Graphs , 2014, Proc. VLDB Endow..

[35]  Jérôme Kunegis,et al.  KONECT: the Koblenz network collection , 2013, WWW.

[36]  Jimmy Lin,et al.  RecService: Distributed Real-Time Graph Processing at Twitter , 2018, HotCloud.

[37]  Venu Satuluri,et al.  Factorbird - a Parameter Server Approach to Distributed Matrix Factorization , 2014, ArXiv.