Sequential Recommendation with Relation-Aware Kernelized Self-Attention

Recent studies identified that sequential Recommendation is improved by the attention mechanism. By following this development, we propose Relation-Aware Kernelized Self-Attention (RKSA) adopting a self-attention mechanism of the Transformer with augmentation of a probabilistic model. The original self-attention of Transformer is a deterministic measure without relation-awareness. Therefore, we introduce a latent space to the self-attention, and the latent space models the recommendation context from relation as a multivariate skew-normal distribution with a kernelized covariance matrix from co-occurrences, item characteristics, and user information. This work merges the self-attention of the Transformer and the sequential recommendation by adding a probabilistic model of the recommendation task specifics. We experimented RKSA over the benchmark datasets, and RKSA shows significant improvements compared to the recent baseline models. Also, RKSA were able to produce a latent space model that answers the reasons for recommendation.

[1]  Yidong Chen,et al.  Deep Semantic Role Labeling with Self-Attention , 2017, AAAI.

[2]  Chang Zhou,et al.  ATRank: An Attention-Based User Behavior Modeling Framework for Recommendation , 2017, AAAI.

[3]  Lina Yao,et al.  Next Item Recommendation with Self-Attentive Metric Learning , 2018 .

[4]  Tie-Yan Liu,et al.  Learning to rank: from pairwise approach to listwise approach , 2007, ICML '07.

[5]  Xiangliang Zhang,et al.  Multi-Order Attentive Ranking Model for Sequential Recommendation , 2019, AAAI.

[6]  Greg Linden,et al.  Amazon . com Recommendations Item-to-Item Collaborative Filtering , 2001 .

[7]  Geoffrey E. Hinton,et al.  Layer Normalization , 2016, ArXiv.

[8]  Hui Xiong,et al.  Sequential Recommender System based on Hierarchical Attention Networks , 2018, IJCAI.

[9]  Qiao Liu,et al.  STAMP: Short-Term Attention/Memory Priority Model for Session-based Recommendation , 2018, KDD.

[10]  Lars Schmidt-Thieme,et al.  BPR: Bayesian Personalized Ranking from Implicit Feedback , 2009, UAI.

[11]  Kyungwoo Song,et al.  Hierarchical Context enabled Recurrent Neural Network for Recommendation , 2019, AAAI.

[12]  Julian J. McAuley,et al.  Ups and Downs: Modeling the Visual Evolution of Fashion Trends with One-Class Collaborative Filtering , 2016, WWW.

[13]  Zhaochun Ren,et al.  Neural Attentive Session-based Recommendation , 2017, CIKM.

[14]  A. Azzalini,et al.  Statistical applications of the multivariate skew normal distribution , 2009, 0911.2093.

[15]  Han Zhang,et al.  Self-Attention Generative Adversarial Networks , 2018, ICML.

[16]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[17]  Tat-Seng Chua,et al.  Neural Collaborative Filtering , 2017, WWW.

[18]  Anton van den Hengel,et al.  Image-Based Recommendations on Styles and Substitutes , 2015, SIGIR.

[19]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[20]  Wei Li,et al.  Behavior sequence transformer for e-commerce recommendation in Alibaba , 2019, Proceedings of the 1st International Workshop on Deep Learning Practice for High-Dimensional Sparse Data.

[21]  Julian J. McAuley,et al.  Self-Attentive Sequential Recommendation , 2018, 2018 IEEE International Conference on Data Mining (ICDM).

[22]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[23]  Y. L. Tong The multivariate normal distribution , 1989 .

[24]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[25]  Changsheng Xu,et al.  CSAN: Contextual Self-Attention Network for User Sequential Recommendation , 2018, ACM Multimedia.

[26]  Xing Wang,et al.  Context-Aware Self-Attention Networks , 2019, AAAI.

[27]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Longbing Cao,et al.  Attention-Based Transactional Context Embedding for Next-Item Recommendation , 2018, AAAI.

[29]  A. Azzalini,et al.  The multivariate skew-normal distribution , 1996 .

[30]  Alexandros Karatzoglou,et al.  Session-based Recommendations with Recurrent Neural Networks , 2015, ICLR.

[31]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..