论文信息 - Learning Self-Modulating Attention in Continuous Time Space with Applications to Sequential Recommendation

Learning Self-Modulating Attention in Continuous Time Space with Applications to Sequential Recommendation

User interests are usually dynamic in the real world, which poses both theoretical and practical challenges for learning accurate preferences from rich behavior data. Among existing user behavior modeling solutions, attention networks are widely adopted for its effectiveness and relative simplicity. Despite being extensively studied, existing attentions still suffer from two limitations: i) conventional attentions mainly take into account the spatial correlation between user behaviors, regardless the distance between those behaviors in the continuous time space; and ii) these attentions mostly provide a dense and undistinguished distribution over all past behaviors then attentively encode them into the output latent representations. This is however not suitable in practical scenarios where a user’s future actions are relevant to a small subset of her/his historical behaviors. In this paper, we propose a novel attention network, named self-modulating attention, that models the complex and non-linearly evolving dynamic user preferences. We empirically demonstrate the effectiveness of our method on top-N sequential recommendation tasks, and the results on three largescale real-world datasets show that our model can achieve state-of-the-art performance.

[1] Gediminas Adomavicius,et al. Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions , 2005, IEEE Transactions on Knowledge and Data Engineering.

[2] Le Song,et al. Know-Evolve: Deep Temporal Reasoning for Dynamic Knowledge Graphs , 2017, ICML.

[3] Junchi Yan,et al. Scalable and Explainable 1-Bit Matrix Completion via Graph Signal Learning , 2021, AAAI.

[4] Deqing Wang,et al. Feature-level Deeper Self-Attention Network for Sequential Recommendation , 2019, IJCAI.

[5] Li Shang,et al. Low-Rank Matrix Approximation with Stability , 2016, ICML.

[6] O. Aalen,et al. Survival and Event History Analysis: A Process Point of View , 2008 .

[7] Xu Chen,et al. Explainable Recommendation: A Survey and New Perspectives , 2018, Found. Trends Inf. Retr..

[8] Yujie Wang,et al. Time Interval Aware Self-Attention for Sequential Recommendation , 2020, WSDM.

[9] Julian J. McAuley,et al. Self-Attentive Sequential Recommendation , 2018, 2018 IEEE International Conference on Data Mining (ICDM).

[10] James Bennett,et al. The Netflix Prize , 2007 .

[11] Yehuda Koren,et al. Factorization meets the neighborhood: a multifaceted collaborative filtering model , 2008, KDD.

[12] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[13] Vlad Niculae,et al. A Regularized Framework for Sparse and Structured Neural Attention , 2017, NIPS.

[14] Li Shang,et al. MPMA: Mixture Probabilistic Matrix Approximation for Collaborative Filtering , 2016, IJCAI.

[15] Edward Y. Chang,et al. Parallel Spectral Clustering in Distributed Systems , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16] Hui Xiong,et al. Sequential Recommender System based on Hierarchical Attention Networks , 2018, IJCAI.

[17] V. Isham,et al. A self-correcting point process , 1979 .

[18] Satya Narayan Shukla,et al. Multi-Time Attention Networks for Irregularly Sampled Time Series , 2020, ICLR.

[19] Yoshua Bengio,et al. Attention-Based Models for Speech Recognition , 2015, NIPS.

[20] Da Xu,et al. Self-attention with Functional Time Representation Learning , 2019, NeurIPS.

[21] Benjamin Recht,et al. Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[22] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[23] John Riedl,et al. An algorithmic framework for performing collaborative filtering , 1999, SIGIR '99.

[24] Lukasz Kaiser,et al. Rethinking Attention with Performers , 2020, ArXiv.

[25] Alexandros Karatzoglou,et al. Session-based Recommendations with Recurrent Neural Networks , 2015, ICLR.

[26] Jure Leskovec,et al. Predicting Dynamic Embedding Trajectory in Temporal Interaction Networks , 2019, KDD.

[27] Julian J. McAuley,et al. Translation-based Recommendation , 2017, RecSys.

[28] Guorui Zhou,et al. Deep Interest Network for Click-Through Rate Prediction , 2017, KDD.

[29] Yehuda Koren,et al. Matrix Factorization Techniques for Recommender Systems , 2009, Computer.

[30] Chen Ma,et al. Hierarchical Gating Networks for Sequential Recommendation , 2019, KDD.

[31] Jitendra Malik,et al. Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[32] P. Atzberger. The Monte-Carlo Method , 2006 .

[33] Tun Lu,et al. Mixture Matrix Approximation for Collaborative Filtering , 2019 .

[34] Jason Eisner,et al. The Neural Hawkes Process: A Neurally Self-Modulating Multivariate Point Process , 2016, NIPS.

[35] Yoshua Bengio,et al. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[36] Hongyuan Zha,et al. Transformer Hawkes Process , 2020, ICML.

[37] Matthew D. Hoffman,et al. Variational Autoencoders for Collaborative Filtering , 2018, WWW.

[38] Le Song,et al. Learning Social Infectivity in Sparse Low-rank Networks Using Multi-dimensional Hawkes Processes , 2013, AISTATS.

[39] Pengfei Wang,et al. Learning Hierarchical Representation Model for NextBasket Recommendation , 2015, SIGIR.

[40] A. Hawkes. Spectra of some self-exciting and mutually exciting point processes , 1971 .

[41] Yongfeng Zhang,et al. Sequential Recommendation with User Memory Networks , 2018, WSDM.

[42] Julian J. McAuley,et al. Fusing Similarity Models with Markov Chains for Sparse Sequential Recommendation , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[43] Hongyuan Zha,et al. Modeling the Intensity Function of Point Process Via Recurrent Neural Networks , 2017, AAAI.