Tuning Word2vec for Large Scale Recommendation Systems

Word2vec is a powerful machine learning tool that emerged from Natural Language Processing (NLP) and is now applied in multiple domains, including recommender systems, forecasting, and network analysis. As Word2vec is often used off the shelf, we address the question of whether the default hyperparameters are suitable for recommender systems. The answer is emphatically no. In this paper, we first elucidate the importance of hyperparameter optimization and show that unconstrained optimization yields an average 221% improvement in hit rate over the default parameters. However, unconstrained optimization leads to hyperparameter settings that are very expensive and not feasible for large scale recommendation tasks. To this end, we demonstrate 138% average improvement in hit rate with a runtime budget-constrained hyperparameter optimization. Furthermore, to make hyperparameter optimization applicable for large scale recommendation problems where the target dataset is too large to search over, we investigate generalizing hyperparameters settings from samples. We show that applying constrained hyperparameter optimization using only a 10% sample of the data still yields a 91% average improvement in hit rate over the default parameters when applied to the full datasets. Finally, we apply hyperparameters learned using our method of constrained optimization on a sample to the Who To Follow recommendation service at Twitter and are able to increase follow rates by 15%.

[1]  Jimmy J. Lin,et al.  WTF: the who to follow service at Twitter , 2013, WWW.

[2]  Kun Guo,et al.  Data mining for the online retail industry: A case study of RFM model-based customer segmentation using data mining , 2012 .

[3]  Yee Whye Teh,et al.  A fast and simple algorithm for training neural probabilistic language models , 2012, ICML.

[4]  Nando de Freitas,et al.  Taking the Human Out of the Loop: A Review of Bayesian Optimization , 2016, Proceedings of the IEEE.

[5]  Petr Sojka,et al.  Software Framework for Topic Modelling with Large Corpora , 2010 .

[6]  Jessie J. Smith,et al.  Privacy-Preserving Recommender Systems Challenge on Twitter's Home Timeline , 2020, ArXiv.

[7]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[8]  Roberto Pagano,et al.  30Music Listening and Playlists Dataset , 2015, RecSys Posters.

[9]  Hugo Caselles-Dupré,et al.  Word2vec applied to recommendation: hyperparameters matter , 2018, RecSys.

[10]  Nazareno Andrade,et al.  Privacy-Aware Recommender Systems Challenge on Twitter's Home Timeline. , 2020 .

[11]  Aapo Hyvärinen,et al.  Noise-Contrastive Estimation of Unnormalized Statistical Models, with Applications to Natural Image Statistics , 2012, J. Mach. Learn. Res..

[12]  Nemanja Djuric,et al.  E-commerce in Your Inbox: Product Recommendations at Scale , 2015, KDD.

[13]  Makbule Gulcin Ozsoy,et al.  From Word Embeddings to Item Recommendation , 2016, ArXiv.

[14]  Wei Xu,et al.  Session-Based Fraud Detection in Online E-Commerce Transactions Using Recurrent Neural Networks , 2017, ECML/PKDD.

[15]  Roberto Pagliari,et al.  Customer Lifetime Value Prediction Using Embeddings , 2017, KDD.

[16]  Ferenc Bodon,et al.  A fast APRIORI implementation , 2003, FIMI.

[17]  Y. Singer,et al.  Ultraconservative online algorithms for multiclass problems , 2003 .

[18]  Cheng Yang,et al.  Learning and Transferring IDs Representation in E-commerce , 2017, KDD.

[19]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[20]  Haibin Cheng,et al.  Real-time Personalization using Embeddings for Search Ranking at Airbnb , 2018, KDD.

[21]  I. A. Antonov,et al.  An economic method of computing LPτ-sequences , 1979 .

[22]  Geoffrey E. Hinton,et al.  A Scalable Hierarchical Distributed Language Model , 2008, NIPS.

[23]  Erik Ordentlich,et al.  Network-Efficient Distributed Word2vec Training System for Large Vocabularies , 2016, CIKM.

[24]  Elena Smirnova,et al.  Meta-Prod2Vec: Product Embeddings Using Side-Information for Recommendation , 2016, RecSys.

[25]  Nemanja Djuric,et al.  E-commerce in Your Inbox: Product Recommendations at Scale , 2015, KDD.

[26]  Oren Barkan,et al.  ITEM2VEC: Neural item embedding for collaborative filtering , 2016, 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP).