GRP: A Gumbel-based Rating Prediction Framework for Imbalanced Recommendation

Rating prediction is a core problem in recommender systems to quantify users preferences towards different items. Due to the imbalanced rating distributions in training data, existing recommendation methods suffer from the biased prediction problem that generates biased prediction results. Thus, their performance on predicting ratings which rarely appear in training data is unsatisfactory. In this paper, inspired by the superior capability of Extreme Value Distribution (EVD)-based methods in modeling the distribution of rare data, we propose a novel \underline{\emph{G}}umbel Distribution-based \underline{\emph{R}}ating \underline{\emph{P}}rediction framework (GRP) which can accurately predict both frequent and rare ratings between users and items. In our approach, we first define different Gumbel distributions for each rating level, which can be learned by historical rating statistics of users and items. Second, we incorporate the Gumbel-based representations of users and items with their original representations learned from the rating matrix and/or reviews to enrich the representations of users and items via a proposed multi-scale convolutional fusion layer. Third, we propose a data-driven rating prediction module to predict the ratings of user-item pairs. It's worthy to note that our approach can be readily applied to existing recommendation methods for addressing their biased prediction problem. To verify the effectiveness of GRP, we conduct extensive experiments on eight benchmark datasets. Compared with several baseline models, the results show that: 1) GRP achieves state-of-the-art overall performance on all eight datasets; 2) GRP makes a substantial improvement in predicting rare ratings, which shows the effectiveness of our model in addressing the bias prediction problem.

[1]  Yehuda Koren,et al.  Factorization meets the neighborhood: a multifaceted collaborative filtering model , 2008, KDD.

[2]  Yiqun Liu,et al.  Rating-Boosted Latent Topics: Understanding Users and Items with Ratings and Reviews , 2016, IJCAI.

[3]  William W. Cohen,et al.  TransNets: Learning to Transform for Recommendation , 2017, RecSys.

[4]  Xing Zhao,et al.  Improving the Estimation of Tail Ratings in Recommender System with Multi-Latent Representations , 2020, WSDM.

[5]  Jiehang Xie,et al.  ACNN-FM: A novel recommender with attention-based convolutional neural network and factorization machines , 2019, Knowl. Based Syst..

[6]  E. Gumbel,et al.  Les valeurs extrêmes des distributions statistiques , 1935 .

[7]  D. Goodin The cambridge dictionary of statistics , 1999 .

[8]  Zhidong Deng,et al.  Densely Connected CNN with Multi-scale Feature Attention for Text Classification , 2018, IJCAI.

[9]  Zhaoyun Ding,et al.  Chinese Medical Question Answer Matching Using End-to-End Character-Level Multi-Scale CNNs , 2017 .

[10]  Steffen Rendle,et al.  Factorization Machines , 2010, 2010 IEEE International Conference on Data Mining.

[11]  Seetha Hari,et al.  Learning From Imbalanced Data , 2019, Advances in Computer and Electrical Engineering.

[12]  Paul Covington,et al.  Deep Neural Networks for YouTube Recommendations , 2016, RecSys.

[13]  Hsin-Chang Yang,et al.  Mining personality traits from social messages for game recommender systems , 2019, Knowl. Based Syst..

[14]  Jie Zhang,et al.  TopicMF: Simultaneously Exploiting Ratings and Reviews for Recommendation , 2014, AAAI.

[15]  Jun Guo,et al.  Aspect-based latent factor model by integrating ratings and reviews for recommender system , 2016, Knowl. Based Syst..

[16]  Yehuda Koren,et al.  Matrix Factorization Techniques for Recommender Systems , 2009, Computer.

[17]  Atsuto Maki,et al.  A systematic study of the class imbalance problem in convolutional neural networks , 2017, Neural Networks.

[18]  Carl Scarrott,et al.  A Review of Extreme Value Threshold Estimation and Uncertainty Quantification , 2012 .

[19]  M. Vousdoukas,et al.  Non-stationary Extreme Value Analysis: a simplified approach for Earth science applications , 2016 .

[20]  Amalia Luque,et al.  The impact of class imbalance in classification performance metrics based on the binary confusion matrix , 2019, Pattern Recognit..

[21]  Michael R. Lyu,et al.  Ratings meet reviews, a combined approach to recommend , 2014, RecSys '14.

[22]  Ann Q. Gates,et al.  TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING , 2005 .

[23]  S. Nadarajah,et al.  Extreme Value Distributions: Theory and Applications , 2000 .

[24]  Xue Liu,et al.  Gated Attentive-Autoencoder for Content-Aware Recommendation , 2018, WSDM.

[25]  Jure Leskovec,et al.  Hidden factors and hidden topics: understanding rating dimensions with review text , 2013, RecSys.

[26]  Yiqun Liu,et al.  Neural Attentional Rating Regression with Review-level Explanations , 2018, WWW.

[27]  Li Peng,et al.  A Capsule Network for Recommendation and Explaining What You Like and Dislike , 2019, SIGIR.

[28]  Robert C. Holte,et al.  C4.5, Class Imbalance, and Cost Sensitivity: Why Under-Sampling beats Over-Sampling , 2003 .

[29]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[30]  Kan Li,et al.  A deeper graph neural network for recommender systems , 2019, Knowl. Based Syst..

[31]  Raphaël Troncy,et al.  entity2rec: Learning User-Item Relatedness from Knowledge Graphs for Top-N Item Recommendation , 2017, RecSys.

[32]  Paul A. Pavlou,et al.  Overcoming the J-shaped distribution of product reviews , 2009, CACM.

[33]  Fangzhao Wu,et al.  Hybrid neural recommendation with joint deep representation learning of ratings and reviews , 2020, Neurocomputing.

[34]  Lili Diao,et al.  Training SVM email classifiers using very large imbalanced dataset , 2012, J. Exp. Theor. Artif. Intell..

[35]  Tat-Seng Chua,et al.  Neural Collaborative Filtering , 2017, WWW.

[36]  Xing Xie,et al.  NRPA: Neural Recommendation with Personalized Attention , 2019, SIGIR.

[37]  Lei Zheng,et al.  Joint Deep Modeling of Users and Items Using Reviews for Recommendation , 2017, WSDM.

[38]  Sinan Kalkan,et al.  Imbalance Problems in Object Detection: A Review , 2020, IEEE transactions on pattern analysis and machine intelligence.

[39]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[40]  Taeho Jo,et al.  A Multiple Resampling Method for Learning from Imbalanced Data Sets , 2004, Comput. Intell..

[41]  Krzysztof Janowicz,et al.  Multi-Scale Representation Learning for Spatial Feature Distributions using Grid Cells , 2020, ICLR.

[42]  Jun Chang,et al.  DAML: Dual Attention Mutual Learning between Ratings and Reviews for Item Recommendation , 2019, KDD.

[43]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[44]  Yan Liu,et al.  Representation Learning of Users and Items for Review Rating Prediction Using Attention-based Convolutional Neural Network , 2017 .

[45]  Bin Liu,et al.  Dealing with class imbalance in classifier chains via random undersampling , 2020, Knowl. Based Syst..

[46]  Dilip Singh Sisodia,et al.  Pair-wise Preference Relation based Probabilistic Matrix Factorization for Collaborative Filtering in Recommender System , 2020, Knowl. Based Syst..

[47]  Wei Zhang,et al.  Collaborative Multi-Level Embedding Learning from Reviews for Rating Prediction , 2016, IJCAI.

[48]  Taghi M. Khoshgoftaar,et al.  Survey on deep learning with class imbalance , 2019, J. Big Data.

[49]  Shafiq R. Joty,et al.  ANR: Aspect-based Neural Recommender , 2018, CIKM.

[50]  Mohammad Khalilia,et al.  Predicting disease risks from highly imbalanced data using random forest , 2011, BMC Medical Informatics Decis. Mak..