BERT4Rec: Sequential Recommendation with Bidirectional Encoder Representations from Transformer

Modeling users' dynamic preferences from their historical behaviors is challenging and crucial for recommendation systems. Previous methods employ sequential neural networks to encode users' historical interactions from left to right into hidden representations for making recommendations. Despite their effectiveness, we argue that such left-to-right unidirectional models are sub-optimal due to the limitations including: \begin enumerate* [label=series\itshape\alph*\upshape)] \item unidirectional architectures restrict the power of hidden representation in users' behavior sequences; \item they often assume a rigidly ordered sequence which is not always practical. \end enumerate* To address these limitations, we proposed a sequential recommendation model called BERT4Rec, which employs the deep bidirectional self-attention to model user behavior sequences. To avoid the information leakage and efficiently train the bidirectional model, we adopt the Cloze objective to sequential recommendation, predicting the random masked items in the sequence by jointly conditioning on their left and right context. In this way, we learn a bidirectional representation model to make recommendations by allowing each item in user historical behaviors to fuse information from both left and right sides. Extensive experiments on four benchmark datasets show that our model outperforms various state-of-the-art sequential models consistently.

[1]  Martin Ester,et al.  Collaborative Denoising Auto-Encoders for Top-N Recommender Systems , 2016, WSDM.

[2]  Jürgen Ziegler,et al.  Sequential User-based Recurrent Neural Network Recommendations , 2017, RecSys.

[3]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[4]  Guy Shani,et al.  An MDP-Based Recommender System , 2002, J. Mach. Learn. Res..

[5]  Wilson L. Taylor,et al.  “Cloze Procedure”: A New Tool for Measuring Readability , 1953 .

[6]  Guandong Xu,et al.  Diversifying Personalized Recommendation with User-session Context , 2017, IJCAI.

[7]  Geoffrey E. Hinton,et al.  Restricted Boltzmann machines for collaborative filtering , 2007, ICML '07.

[8]  Longbing Cao,et al.  Attention-Based Transactional Context Embedding for Next-Item Recommendation , 2018, AAAI.

[9]  Alex Beutel,et al.  Recurrent Recommender Networks , 2017, WSDM.

[10]  Kevin Gimpel,et al.  Bridging Nonlinearities and Stochastic Regularizers with Gaussian Error Linear Units , 2016, ArXiv.

[11]  Greg Linden,et al.  Amazon . com Recommendations Item-to-Item Collaborative Filtering , 2001 .

[12]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[13]  Dit-Yan Yeung,et al.  Collaborative Deep Learning for Recommender Systems , 2014, KDD.

[14]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[15]  Qiao Liu,et al.  STAMP: Short-Term Attention/Memory Priority Model for Session-based Recommendation , 2018, KDD.

[16]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[17]  Benjamin Schrauwen,et al.  Deep content-based music recommendation , 2013, NIPS.

[18]  Lars Schmidt-Thieme,et al.  BPR: Bayesian Personalized Ranking from Implicit Feedback , 2009, UAI.

[19]  Edward Y. Chang,et al.  Improving Sequential Recommendation with Knowledge-Enhanced Memory Networks , 2018, SIGIR.

[20]  John Riedl,et al.  Item-based collaborative filtering recommendation algorithms , 2001, WWW '01.

[21]  Lukasz Kaiser,et al.  Generating Wikipedia by Summarizing Long Sequences , 2018, ICLR.

[22]  Yehuda Koren,et al.  Matrix Factorization Techniques for Recommender Systems , 2009, Computer.

[23]  Julian J. McAuley,et al.  Translation-based Recommendation , 2017, RecSys.

[24]  Alexandros Karatzoglou,et al.  Personalizing Session-based Recommendations with Hierarchical Recurrent Neural Networks , 2017, RecSys.

[25]  Yehuda Koren,et al.  Advances in Collaborative Filtering , 2011, Recommender Systems Handbook.

[26]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[27]  Huan Liu,et al.  What Your Images Reveal: Exploiting Visual Contents for Point-of-Interest Recommendation , 2017, WWW.

[28]  Ashish Vaswani,et al.  Self-Attention with Relative Position Representations , 2018, NAACL.

[29]  Yehuda Koren,et al.  Factorization meets the neighborhood: a multifaceted collaborative filtering model , 2008, KDD.

[30]  Donghyun Kim,et al.  Convolutional Matrix Factorization for Document Context-Aware Recommendation , 2016, RecSys.

[31]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[32]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[33]  Lars Schmidt-Thieme,et al.  Factorizing personalized Markov chains for next-basket recommendation , 2010, WWW '10.

[34]  Yongfeng Zhang,et al.  Sequential Recommendation with User Memory Networks , 2018, WSDM.

[35]  Scott Sanner,et al.  AutoRec: Autoencoders Meet Collaborative Filtering , 2015, WWW.

[36]  Ke Wang,et al.  Personalized Top-N Sequential Recommendation via Convolutional Sequence Embedding , 2018, WSDM.

[37]  Diyi Yang,et al.  Hierarchical Attention Networks for Document Classification , 2016, NAACL.

[38]  Ruslan Salakhutdinov,et al.  Probabilistic Matrix Factorization , 2007, NIPS.

[39]  F. Maxwell Harper,et al.  The MovieLens Datasets: History and Context , 2016, TIIS.

[40]  Julian J. McAuley,et al.  Self-Attentive Sequential Recommendation , 2018, 2018 IEEE International Conference on Data Mining (ICDM).

[41]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[42]  Paul Covington,et al.  Deep Neural Networks for YouTube Recommendations , 2016, RecSys.

[43]  Rico Sennrich,et al.  Why Self-Attention? A Targeted Evaluation of Neural Machine Translation Architectures , 2018, EMNLP.

[44]  Anton van den Hengel,et al.  Image-Based Recommendations on Styles and Substitutes , 2015, SIGIR.

[45]  Alexandros Karatzoglou,et al.  Session-based Recommendations with Recurrent Neural Networks , 2015, ICLR.

[46]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[47]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[48]  George Karypis,et al.  FISM: factored item similarity models for top-N recommender systems , 2013, KDD.

[49]  Zhaochun Ren,et al.  Neural Attentive Session-based Recommendation , 2017, CIKM.

[50]  Alexandros Karatzoglou,et al.  Recurrent Neural Networks with Top-k Gains for Session-based Recommendations , 2017, CIKM.

[51]  Chen Fang,et al.  Visually-Aware Fashion Recommendation and Design with Generative Image Models , 2017, 2017 IEEE International Conference on Data Mining (ICDM).

[52]  Bowen Zhou,et al.  A Structured Self-attentive Sentence Embedding , 2017, ICLR.

[53]  Tat-Seng Chua,et al.  Neural Collaborative Filtering , 2017, WWW.

[54]  Alec Radford,et al.  Improving Language Understanding by Generative Pre-Training , 2018 .

[55]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[56]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[57]  Julian J. McAuley,et al.  Fusing Similarity Models with Markov Chains for Sparse Sequential Recommendation , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[58]  Jüri Lember,et al.  Bridging Viterbi and posterior decoding: a generalized risk approach to hidden path inference based on hidden Markov models , 2014, J. Mach. Learn. Res..

[59]  Feng Yu,et al.  A Dynamic Recurrent Model for Next Basket Recommendation , 2016, SIGIR.

[60]  Geoffrey E. Hinton,et al.  Layer Normalization , 2016, ArXiv.

[61]  Jian Li,et al.  Multi-Head Attention with Disagreement Regularization , 2018, EMNLP.