Fusion Strategies for Learning User Embeddings with Neural Networks

Growing amounts of online user data motivate the need for automated processing techniques. In case of user ratings, one interesting option is to use neural networks for learning to predict ratings given an item and a user. While training for prediction, such an approach at the same time learns to map each user to a vector, a so-called user embedding. Such embeddings can for example be valuable for estimating user similarity. However, there are various ways how item and user information can be combined in neural networks, and it is unclear how the way of combining affects the resulting embeddings.In this paper, we run an experiment on movie ratings data, where we analyze the effect on embedding quality caused by several fusion strategies in neural networks. For evaluating embedding quality, we propose a novel measure, Pair-Distance Correlation, which quantifies the condition that similar users should have similar embedding vectors. We find that the fusion strategy affects results in terms of both prediction performance and embedding quality. Surprisingly, we find that prediction performance not necessarily reflects embedding quality. This suggests that if embeddings are of interest, the common tendency to select models based on their prediction ability should be reconsidered.

[1]  Klaus-Robert Müller,et al.  Interpretable deep neural networks for single-trial EEG classification , 2016, Journal of Neuroscience Methods.

[2]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[3]  Steffen Rendle,et al.  Factorization Machines , 2010, 2010 IEEE International Conference on Data Mining.

[4]  Joaquin Quiñonero Candela,et al.  Web-Scale Bayesian Click-Through rate Prediction for Sponsored Search Advertising in Microsoft's Bing Search Engine , 2010, ICML.

[5]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[6]  Songbo Tan,et al.  A survey on sentiment detection of reviews , 2009, Expert Syst. Appl..

[7]  Juhan Nam,et al.  Multimodal Deep Learning , 2011, ICML.

[8]  Lei Zhang,et al.  A Survey of Opinion Mining and Sentiment Analysis , 2012, Mining Text Data.

[9]  Byron C. Wallace,et al.  Quantifying Mental Health from Social Media with Neural User Embeddings , 2017, MLHC.

[10]  Hans-Georg Müller,et al.  Functional Data Analysis , 2016 .

[11]  J. Schmidhuber,et al.  The Sacred Infrastructure for Computational Research , 2017, SciPy.

[12]  Anton van den Hengel,et al.  Tips and Tricks for Visual Question Answering: Learnings from the 2017 Challenge , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[13]  Wojciech Samek,et al.  Methods for interpreting and understanding deep neural networks , 2017, Digit. Signal Process..

[14]  Byron C. Wallace,et al.  Modelling Context with User Embeddings for Sarcasm Detection in Social Media , 2016, CoNLL.

[15]  Immanuel Bayer fastFM: A Library for Factorization Machines , 2016, J. Mach. Learn. Res..

[16]  Matthieu Cord,et al.  MUTAN: Multimodal Tucker Fusion for Visual Question Answering , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[17]  Lior Rokach,et al.  Recommender Systems: Introduction and Challenges , 2015, Recommender Systems Handbook.

[18]  Klaus-Robert Müller,et al.  Explainable Artificial Intelligence: Understanding, Visualizing and Interpreting Deep Learning Models , 2017, ArXiv.

[19]  Steffen Rendle Social Network and Click-through Prediction with Factorization Machines , 2012, KDD 2012.

[20]  David Bamman,et al.  Distributed Representations of Geographically Situated Language , 2014, ACL.

[21]  Philipp Cimiano,et al.  Learning Compositionality Functions on Word Embeddings for Modelling Attribute Meaning in Adjective-Noun Phrases , 2017, EACL.

[22]  Luca Bertinetto,et al.  Learning feed-forward one-shot learners , 2016, NIPS.

[23]  E. Guevara A Regression Model of Adjective-Noun Compositionality in Distributional Semantics , 2010 .

[24]  Lars Schmidt-Thieme,et al.  Pairwise interaction tensor factorization for personalized tag recommendation , 2010, WSDM '10.

[25]  Takao Kobayashi,et al.  Speaking style adaptation using context clustering decision tree for HMM-based speech synthesis , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[26]  Steffen Rendle,et al.  Factor Models for Recommending Given Names , 2013 .

[27]  Songjie Gong A Collaborative Filtering Recommendation Algorithm Based on User Clustering and Item Clustering , 2010, J. Softw..

[28]  F. Maxwell Harper,et al.  The MovieLens Datasets: History and Context , 2016, TIIS.

[29]  Jörn Hees,et al.  An Overview of Computational Approaches for Analyzing Interpretation , 2018, ArXiv.

[30]  Yehuda Koren,et al.  Factorization meets the neighborhood: a multifaceted collaborative filtering model , 2008, KDD.

[31]  Julien Jacques,et al.  Functional data clustering: a survey , 2013, Advances in Data Analysis and Classification.