Addressing the Rank Degeneration in Sequential Recommendation via Singular Spectrum Smoothing

Sequential recommendation (SR) investigates the dynamic user preferences modeling and generates the next-item prediction. The next item preference is typically generated by the affinity between the sequence and item representations. However, both sequence and item representations suffer from the rank degeneration issue due to the data sparsity problem. The rank degeneration issue significantly impairs the representations for SR. This motivates us to measure how severe is the rank degeneration issue and alleviate the sequence and item representation rank degeneration issues simultaneously for SR. In this work, we theoretically connect the sequence representation degeneration issue with the item rank degeneration, particularly for short sequences and cold items. We also identify the connection between the fast singular value decay phenomenon and the rank collapse issue in transformer sequence output and item embeddings. We propose the area under the singular value curve metric to evaluate the severity of the singular value decay phenomenon and use it as an indicator of rank degeneration. We further introduce a novel singular spectrum smoothing regularization to alleviate the rank degeneration on both sequence and item sides, which is the Singular sPectrum sMoothing for sequential Recommendation (SPMRec). We also establish a correlation between the ranks of sequence and item embeddings and the rank of the user-item preference prediction matrix, which can affect recommendation diversity. We conduct experiments on four benchmark datasets to demonstrate the superiority of SPMRec over the state-of-the-art recommendation methods, especially in short sequences. The experiments also demonstrate a strong connection between our proposed singular spectrum smoothing and recommendation diversity.

[1]  J. Susskind,et al.  Stabilizing Transformer Training by Preventing Attention Entropy Collapse , 2023, ArXiv.

[2]  Wenjie Li,et al.  Addressing Token Uniformity in Transformers via Singular Value Transformation , 2022, UAI.

[3]  M. Zhang,et al.  Towards Representation Alignment and Uniformity in Collaborative Filtering , 2022, KDD.

[4]  Yiying Li,et al.  Nuclear Norm Maximization Based Curiosity-Driven Learning , 2022, ArXiv.

[5]  Wayne Xin Zhao,et al.  Filter-enhanced MLP is All You Need for Sequential Recommendation , 2022, WWW.

[6]  Philip S. Yu,et al.  Sequential Recommendation via Stochastic Self-Attention , 2022, WWW.

[7]  Zijian Wang,et al.  Contrastive Learning for Representation Degeneration Problem in Sequential Recommendation , 2021, WSDM.

[8]  Jieming Zhu,et al.  SimpleX: A Simple and Strong Baseline for Collaborative Filtering , 2021, CIKM.

[9]  Sung-Hoon Yoon,et al.  Rare Tokens Degenerate All Tokens: Improving Neural Text Generation via Adaptive Gradient Gating for Rare Token Embeddings , 2021, ACL.

[10]  Qingming Huang,et al.  Fast Batch Nuclear-norm Maximization and Minimization for Robust Domain Adaptation , 2021, ArXiv.

[11]  Zhiwei Liu,et al.  Modeling Sequences as Distributions with Uncertainty for Sequential Recommendation , 2021, CIKM.

[12]  Philip S. Yu,et al.  Augmenting Sequential Recommendation with Pseudo-Prior Items via Reversely Pre-training Transformer , 2021, SIGIR.

[13]  Jean-Baptiste Cordonnier,et al.  Attention is Not All You Need: Pure Attention Loses Rank Doubly Exponentially with Depth , 2021, ICML.

[14]  Cong Xu,et al.  Revisiting Representation Degeneration Problem in Language Modeling , 2020, FINDINGS.

[15]  Cho-Jui Hsieh,et al.  SSE-PT: Sequential Recommendation Via Personalized Transformer , 2020, RecSys.

[16]  Ji-Rong Wen,et al.  S3-Rec: Self-Supervised Learning for Sequential Recommendation with Mutual Information Maximization , 2020, CIKM.

[17]  Walid Krichene,et al.  On Sampled Metrics for Item Recommendation , 2020, KDD.

[18]  Jing Huang,et al.  Improving Neural Language Generation with Spectrum Control , 2020, ICLR.

[19]  Qingming Huang,et al.  Towards Discriminability and Diversity: Batch Nuclear-Norm Maximization Under Label Insufficient Situations , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  S. Parthasarathy,et al.  $\mathop {\mathtt {HAM}}$HAM: Hybrid Associations Models for Sequential Recommendation , 2020, IEEE Trans. Knowl. Data Eng..

[21]  Yujie Wang,et al.  Time Interval Aware Self-Attention for Sequential Recommendation , 2020, WSDM.

[22]  Shuo Cheng,et al.  CosRec: 2D Convolutional Neural Networks for Sequential Recommendation , 2019, CIKM.

[23]  Depeng Jin,et al.  Reinforced Negative Sampling for Recommendation with Exposure Data , 2019, IJCAI.

[24]  Di He,et al.  Representation Degeneration Problem in Training Natural Language Generation Models , 2019, ICLR.

[25]  Lei Zheng,et al.  Gated Spectral Units: Modeling Co-evolving Patterns for Sequential Recommendation , 2019, SIGIR.

[26]  Chen Ma,et al.  Hierarchical Gating Networks for Sequential Recommendation , 2019, KDD.

[27]  Peng Jiang,et al.  BERT4Rec: Sequential Recommendation with Bidirectional Encoder Representations from Transformer , 2019, CIKM.

[28]  Julian J. McAuley,et al.  Self-Attentive Sequential Recommendation , 2018, 2018 IEEE International Conference on Data Mining (ICDM).

[29]  Joemon M. Jose,et al.  A Simple Convolutional Generative Network for Next Item Recommendation , 2018, WSDM.

[30]  Ke Wang,et al.  Personalized Top-N Sequential Recommendation via Convolutional Sequence Embedding , 2018, WSDM.

[31]  Yizhou Sun,et al.  On Sampling Strategies for Neural Network-based Collaborative Filtering , 2017, KDD.

[32]  Alexandros Karatzoglou,et al.  Personalizing Session-based Recommendations with Hierarchical Recurrent Neural Networks , 2017, RecSys.

[33]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[34]  Alex Beutel,et al.  Recurrent Recommender Networks , 2017, WSDM.

[35]  Julian J. McAuley,et al.  Fusing Similarity Models with Markov Chains for Sparse Sequential Recommendation , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[36]  Balázs Hidasi,et al.  Recurrent neural networks , 2013, Scholarpedia.

[37]  Lars Schmidt-Thieme,et al.  Factorizing personalized Markov chains for next-basket recommendation , 2010, WWW '10.

[38]  Yehuda Koren,et al.  Matrix Factorization Techniques for Recommender Systems , 2009, Computer.

[39]  Lars Schmidt-Thieme,et al.  BPR: Bayesian Personalized Ranking from Implicit Feedback , 2009, UAI.

[40]  Mi Zhang,et al.  Avoiding monotony: improving the diversity of recommendation lists , 2008, RecSys '08.

[41]  Sean M. McNee,et al.  Improving recommendation lists through topic diversification , 2005, WWW '05.

[42]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.