Lighter and Better: Low-Rank Decomposed Self-Attention Networks for Next-Item Recommendation

Self-attention networks (SANs) have been intensively applied for sequential recommenders, but they are limited due to: (1) the quadratic complexity and vulnerability to over-parameterization in self-attention; (2) inaccurate modeling of sequential relations between items due to the implicit position encoding. In this work, we propose the low-rank decomposed self-attention networks (LightSANs) to overcome these problems. Particularly, we introduce the low-rank decomposed self-attention, which projects user's historical items into a small constant number of latent interests and leverages item-to-interest interaction to generate the context-aware representation. It scales linearly w.r.t. the user's historical sequence length in terms of time and space, and is more resilient to over-parameterization. Besides, we design the decoupled position encoding, which models the sequential relations between items more precisely. Extensive experimental studies are carried out on three real-world datasets, where LightSANs outperform the existing SANs-based recommenders in terms of both effectiveness and efficiency.

[1]  Ke Wang,et al.  Personalized Top-N Sequential Recommendation via Convolutional Sequence Embedding , 2018, WSDM.

[2]  Zhaochun Ren,et al.  Neural Attentive Session-based Recommendation , 2017, CIKM.

[3]  Arman Cohan,et al.  Longformer: The Long-Document Transformer , 2020, ArXiv.

[4]  Tie-Yan Liu,et al.  Rethinking Positional Encoding in Language Pre-training , 2020, ICLR.

[5]  Ji-Rong Wen,et al.  S3-Rec: Self-Supervised Learning for Sequential Recommendation with Mutual Information Maximization , 2020, CIKM.

[6]  Marc'Aurelio Ranzato,et al.  Large Scale Distributed Deep Networks , 2012, NIPS.

[7]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[8]  Ashish Vaswani,et al.  Self-Attention with Relative Position Representations , 2018, NAACL.

[9]  Julian J. McAuley,et al.  Self-Attentive Sequential Recommendation , 2018, 2018 IEEE International Conference on Data Mining (ICDM).

[10]  Ji-Rong Wen,et al.  RecBole: Towards a Unified, Comprehensive and Efficient Framework for Recommendation Algorithms , 2020, CIKM.

[11]  Mingjie Sun,et al.  Rethinking the Value of Network Pruning , 2018, ICLR.

[12]  Nikolaos Pappas,et al.  Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention , 2020, ICML.

[13]  Peng Jiang,et al.  BERT4Rec: Sequential Recommendation with Bidirectional Encoder Representations from Transformer , 2019, CIKM.

[14]  Lars Schmidt-Thieme,et al.  Factorizing personalized Markov chains for next-basket recommendation , 2010, WWW '10.

[15]  Alexandros Karatzoglou,et al.  Session-based Recommendations with Recurrent Neural Networks , 2015, ICLR.

[16]  Han Fang,et al.  Linformer: Self-Attention with Linear Complexity , 2020, ArXiv.

[17]  Lukasz Kaiser,et al.  Rethinking Attention with Performers , 2020, ArXiv.

[18]  Yi Tay,et al.  Synthesizer: Rethinking Self-Attention for Transformer Models , 2020, ICML.