Neural Collaborative Filtering vs. Matrix Factorization Revisited

Embedding based models have been the state of the art in collaborative filtering for over a decade. Traditionally, the dot product or higher order equivalents have been used to combine two or more embeddings, e.g., most notably in matrix factorization. In recent years, it was suggested to replace the dot product with a learned similarity e.g. using a multilayer perceptron (MLP). This approach is often referred to as neural collaborative filtering (NCF). In this work, we revisit the experiments of the NCF paper that popularized learned similarities using MLPs. First, we show that with a proper hyperparameter selection, a simple dot product substantially outperforms the proposed learned similarities. Second, while a MLP can in theory approximate any function, we show that it is non-trivial to learn a dot product with an MLP. Finally, we discuss practical issues that arise when applying MLP based similarities and show that MLPs are too costly to use for item recommendation in production environments while dot products allow to apply very efficient retrieval algorithms. We conclude that MLPs should be used with care as embedding combiner and that dot products might be a better default choice.

[1]  Liwei Wang,et al.  Gradient Descent Finds Global Minima of Deep Neural Networks , 2018, ICML.

[2]  David Patterson,et al.  MLPerf Training Benchmark , 2019, MLSys.

[3]  Wei Liu,et al.  Mixture-Rank Matrix Approximation for Collaborative Filtering , 2017, NIPS.

[4]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[5]  Philip S. Yu,et al.  Leveraging Meta-path based Context for Top- N Recommendation with A Neural Co-Attention Model , 2018, KDD.

[6]  Tat-Seng Chua,et al.  Learning Image and User Features for Recommendation in Social Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[7]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..

[8]  Xiaoyu Du,et al.  Outer Product-based Neural Collaborative Filtering , 2018, IJCAI.

[9]  Jia Li,et al.  Latent Cross: Making Use of Context in Recurrent Recommender Systems , 2018, WSDM.

[10]  Andrew R. Barron,et al.  Universal approximation bounds for superpositions of a sigmoidal function , 1993, IEEE Trans. Inf. Theory.

[11]  Tat-Seng Chua,et al.  Neural Collaborative Filtering , 2017, WWW.

[12]  Arkadiusz Paterek,et al.  Improving regularized singular value decomposition for collaborative filtering , 2007 .

[13]  Paolo Bellavista,et al.  A Pre-Filtering Approach for Incorporating Contextual Information Into Deep Learning Based Recommender Systems , 2020, IEEE Access.

[14]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[15]  Yuanzhi Li,et al.  A Convergence Theory for Deep Learning via Over-Parameterization , 2018, ICML.

[16]  Yehuda Koren,et al.  The BellKor Solution to the Netflix Grand Prize , 2009 .

[17]  Dietmar Jannach,et al.  Critically Examining the Claimed Value of Convolutions over User-Item Embedding Maps for Recommender Systems , 2020, CIKM.

[18]  Hamed Zamani,et al.  Learning a Joint Search and Recommendation Model from User-Item Interactions , 2020, WSDM.

[19]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Ping Li,et al.  Asymmetric LSH (ALSH) for Sublinear Time Maximum Inner Product Search (MIPS) , 2014, NIPS.

[21]  Dietmar Jannach,et al.  A Troubling Analysis of Reproducibility and Progress in Recommender Systems Research , 2019, ACM Trans. Inf. Syst..

[22]  F. Maxwell Harper,et al.  The MovieLens Datasets: History and Context , 2016, TIIS.

[23]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[24]  Wei Niu,et al.  Neural Personalized Ranking for Image Recommendation , 2018, WSDM.

[25]  Paul Covington,et al.  Deep Neural Networks for YouTube Recommendations , 2016, RecSys.

[26]  Daniel M. Roy,et al.  Neural Network Matrix Factorization , 2015, ArXiv.

[27]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[28]  Yehuda Koren,et al.  On the Difficulty of Evaluating Baselines: A Study on Recommender Systems , 2019, ArXiv.

[29]  Yehuda Koren,et al.  Advances in Collaborative Filtering , 2011, Recommender Systems Handbook.

[30]  Yuanzhi Li,et al.  Convergence Analysis of Two-layer Neural Networks with ReLU Activation , 2017, NIPS.

[31]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[32]  Yong Yu,et al.  Sequential Recommendation with Dual Side Neighbor-based Collaborative Relation Modeling , 2019, WSDM.

[33]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[34]  Yifan Hu,et al.  Collaborative Filtering for Implicit Feedback Datasets , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[35]  Alexandr Andoni,et al.  Learning Polynomials with Neural Networks , 2014, ICML.

[36]  Andrew W. Moore,et al.  An Investigation of Practical Approximate Nearest Neighbor Algorithms , 2004, NIPS.

[37]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[38]  George Karypis,et al.  SLIM: Sparse Linear Methods for Top-N Recommender Systems , 2011, 2011 IEEE 11th International Conference on Data Mining.

[39]  Xing Zhao,et al.  Improving the Estimation of Tail Ratings in Recommender System with Multi-Latent Representations , 2020, WSDM.