Learning Self-Modulating Attention in Continuous Time Space with Applications to Sequential Recommendation

User interests are usually dynamic in the real world, which poses both theoretical and practical challenges for learning accurate preferences from rich behavior data. Among existing user behavior modeling solutions, attention networks are widely adopted for its effectiveness and relative simplicity. Despite being extensively studied, existing attentions still suffer from two limitations: i) conventional attentions mainly take into account the spatial correlation between user behaviors, regardless the distance between those behaviors in the continuous time space; and ii) these attentions mostly provide a dense and undistinguished distribution over all past behaviors then attentively encode them into the output latent representations. This is however not suitable in practical scenarios where a user’s future actions are relevant to a small subset of her/his historical behaviors. In this paper, we propose a novel attention network, named self-modulating attention, that models the complex and non-linearly evolving dynamic user preferences. We empirically demonstrate the effectiveness of our method on top-N sequential recommendation tasks, and the results on three largescale real-world datasets show that our model can achieve state-of-the-art performance.

[1]  Gediminas Adomavicius,et al.  Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions , 2005, IEEE Transactions on Knowledge and Data Engineering.

[2]  Le Song,et al.  Know-Evolve: Deep Temporal Reasoning for Dynamic Knowledge Graphs , 2017, ICML.

[3]  Junchi Yan,et al.  Scalable and Explainable 1-Bit Matrix Completion via Graph Signal Learning , 2021, AAAI.

[4]  Deqing Wang,et al.  Feature-level Deeper Self-Attention Network for Sequential Recommendation , 2019, IJCAI.

[5]  Li Shang,et al.  Low-Rank Matrix Approximation with Stability , 2016, ICML.

[6]  O. Aalen,et al.  Survival and Event History Analysis: A Process Point of View , 2008 .

[7]  Xu Chen,et al.  Explainable Recommendation: A Survey and New Perspectives , 2018, Found. Trends Inf. Retr..

[8]  Yujie Wang,et al.  Time Interval Aware Self-Attention for Sequential Recommendation , 2020, WSDM.

[9]  Julian J. McAuley,et al.  Self-Attentive Sequential Recommendation , 2018, 2018 IEEE International Conference on Data Mining (ICDM).

[10]  James Bennett,et al.  The Netflix Prize , 2007 .

[11]  Yehuda Koren,et al.  Factorization meets the neighborhood: a multifaceted collaborative filtering model , 2008, KDD.

[12]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[13]  Vlad Niculae,et al.  A Regularized Framework for Sparse and Structured Neural Attention , 2017, NIPS.

[14]  Li Shang,et al.  MPMA: Mixture Probabilistic Matrix Approximation for Collaborative Filtering , 2016, IJCAI.

[15]  Edward Y. Chang,et al.  Parallel Spectral Clustering in Distributed Systems , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Hui Xiong,et al.  Sequential Recommender System based on Hierarchical Attention Networks , 2018, IJCAI.

[17]  V. Isham,et al.  A self-correcting point process , 1979 .

[18]  Satya Narayan Shukla,et al.  Multi-Time Attention Networks for Irregularly Sampled Time Series , 2020, ICLR.

[19]  Yoshua Bengio,et al.  Attention-Based Models for Speech Recognition , 2015, NIPS.

[20]  Da Xu,et al.  Self-attention with Functional Time Representation Learning , 2019, NeurIPS.

[21]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[22]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[23]  John Riedl,et al.  An algorithmic framework for performing collaborative filtering , 1999, SIGIR '99.

[24]  Lukasz Kaiser,et al.  Rethinking Attention with Performers , 2020, ArXiv.

[25]  Alexandros Karatzoglou,et al.  Session-based Recommendations with Recurrent Neural Networks , 2015, ICLR.

[26]  Jure Leskovec,et al.  Predicting Dynamic Embedding Trajectory in Temporal Interaction Networks , 2019, KDD.

[27]  Julian J. McAuley,et al.  Translation-based Recommendation , 2017, RecSys.

[28]  Guorui Zhou,et al.  Deep Interest Network for Click-Through Rate Prediction , 2017, KDD.

[29]  Yehuda Koren,et al.  Matrix Factorization Techniques for Recommender Systems , 2009, Computer.

[30]  Chen Ma,et al.  Hierarchical Gating Networks for Sequential Recommendation , 2019, KDD.

[31]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[32]  P. Atzberger The Monte-Carlo Method , 2006 .

[33]  Tun Lu,et al.  Mixture Matrix Approximation for Collaborative Filtering , 2019 .

[34]  Jason Eisner,et al.  The Neural Hawkes Process: A Neurally Self-Modulating Multivariate Point Process , 2016, NIPS.

[35]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[36]  Hongyuan Zha,et al.  Transformer Hawkes Process , 2020, ICML.

[37]  Matthew D. Hoffman,et al.  Variational Autoencoders for Collaborative Filtering , 2018, WWW.

[38]  Le Song,et al.  Learning Social Infectivity in Sparse Low-rank Networks Using Multi-dimensional Hawkes Processes , 2013, AISTATS.

[39]  Pengfei Wang,et al.  Learning Hierarchical Representation Model for NextBasket Recommendation , 2015, SIGIR.

[40]  A. Hawkes Spectra of some self-exciting and mutually exciting point processes , 1971 .

[41]  Yongfeng Zhang,et al.  Sequential Recommendation with User Memory Networks , 2018, WSDM.

[42]  Julian J. McAuley,et al.  Fusing Similarity Models with Markov Chains for Sparse Sequential Recommendation , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[43]  Hongyuan Zha,et al.  Modeling the Intensity Function of Point Process Via Recurrent Neural Networks , 2017, AAAI.

[44]  Gerhard Widmer,et al.  Learning in the Presence of Concept Drift and Hidden Contexts , 1996, Machine Learning.

[45]  Hoon Kim,et al.  Monte Carlo Statistical Methods , 2000, Technometrics.

[46]  Alex Beutel,et al.  Recurrent Recommender Networks , 2017, WSDM.

[47]  Arnold Neumaier,et al.  Introduction to Numerical Analysis , 2001 .

[48]  Jeffrey D. Scargle,et al.  An Introduction to the Theory of Point Processes, Vol. I: Elementary Theory and Methods , 2004, Technometrics.

[49]  Junchi Yan,et al.  Modeling Dynamic User Preference via Dictionary Learning for Sequential Recommendation , 2022, IEEE Transactions on Knowledge and Data Engineering.

[50]  Sanjiv Kumar,et al.  Orthogonal Random Features , 2016, NIPS.

[51]  Ramón Fernández Astudillo,et al.  From Softmax to Sparsemax: A Sparse Model of Attention and Multi-Label Classification , 2016, ICML.

[52]  Tat-Seng Chua,et al.  Neural Collaborative Filtering , 2017, WWW.

[53]  Qiang Yang,et al.  Collaborative Evolution for User Profiling in Recommender Systems , 2016, IJCAI.

[54]  Jian Li,et al.  Kalman Filtering Attention for User Behavior Modeling in CTR Prediction , 2020, NeurIPS.

[55]  Francisco Herrera,et al.  Explainable Artificial Intelligence (XAI): Concepts, Taxonomies, Opportunities and Challenges toward Responsible AI , 2020, Inf. Fusion.

[56]  Hongyuan Zha,et al.  DyRep: Learning Representations over Dynamic Graphs , 2019, ICLR.

[57]  Taghi M. Khoshgoftaar,et al.  A Survey of Collaborative Filtering Techniques , 2009, Adv. Artif. Intell..

[58]  Ameet Talwalkar,et al.  Divide-and-Conquer Matrix Factorization , 2011, NIPS.

[59]  Wei Liu,et al.  Mixture-Rank Matrix Approximation for Collaborative Filtering , 2017, NIPS.

[60]  Li Shang,et al.  WEMAREC: Accurate and Scalable Recommendation through Weighted and Ensemble Matrix Approximation , 2015, SIGIR.

[61]  Yehuda Koren,et al.  Collaborative filtering with temporal dynamics , 2009, KDD.

[62]  Alexander J. Smola,et al.  Maximum Margin Matrix Factorization for Collaborative Ranking , 2007 .

[63]  John Riedl,et al.  Item-based collaborative filtering recommendation algorithms , 2001, WWW '01.

[64]  Harald Steck,et al.  Markov Random Fields for Collaborative Filtering , 2019, NeurIPS.