Large-scale user modeling with recurrent neural networks for music discovery on multiple time scales

The amount of content on online music streaming platforms is immense, and most users only access a tiny fraction of this content. Recommender systems are the application of choice to open up the collection to these users. Collaborative filtering has the disadvantage that it relies on explicit ratings, which are often unavailable, and generally disregards the temporal nature of music consumption. On the other hand, item co-occurrence algorithms, such as the recently introduced word2vec-based recommenders, are typically left without an effective user representation. In this paper, we present a new approach to model users through recurrent neural networks by sequentially processing consumed items, represented by any type of embeddings and other context features. This way we obtain semantically rich user representations, which capture a user’s musical taste over time. Our experimental analysis on large-scale user data shows that our model can be used to predict future songs a user will likely listen to, both in the short and long term.

[1]  Moses Charikar,et al.  Similarity estimation techniques from rounding algorithms , 2002, STOC '02.

[2]  Padhraic Smyth,et al.  KDD Cup and workshop 2007 , 2007, SKDD.

[3]  Arkadiusz Paterek,et al.  Improving regularized singular value decomposition for collaborative filtering , 2007 .

[4]  Yong Liu,et al.  Improved Recurrent Neural Networks for Session-based Recommendations , 2016, DLRS@RecSys.

[5]  Fei-Fei Li,et al.  Visualizing and Understanding Recurrent Networks , 2015, ArXiv.

[6]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[7]  Andrew L. Maas Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .

[8]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[9]  Makbule Gulcin Ozsoy,et al.  From Word Embeddings to Item Recommendation , 2016, ArXiv.

[10]  Heiga Zen,et al.  WaveNet: A Generative Model for Raw Audio , 2016, SSW.

[11]  Joshua Goodman,et al.  Classes for fast maximum entropy training , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[12]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[13]  David M. Blei,et al.  Factorization Meets the Item Embedding: Regularizing Matrix Factorization with Item Co-occurrence , 2016, RecSys.

[14]  Yehuda Koren,et al.  Matrix Factorization Techniques for Recommender Systems , 2009, Computer.

[15]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[16]  Tapani Raiko,et al.  International Conference on Learning Representations (ICLR) , 2016 .

[17]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[18]  Pradeep Dubey,et al.  Parallelizing Word2Vec in Multi-Core and Many-Core Architectures , 2016, ArXiv.

[19]  Bruno Ribeiro,et al.  Mining Online Music Listening Trajectories , 2016, ISMIR.

[20]  Jürgen Schmidhuber,et al.  LSTM: A Search Space Odyssey , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[21]  James Bennett,et al.  The Netflix Prize , 2007 .

[22]  Yifan Hu,et al.  Collaborative Filtering for Implicit Feedback Datasets , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[23]  Benjamin Schrauwen,et al.  Deep content-based music recommendation , 2013, NIPS.

[24]  Navdeep Jaitly,et al.  Towards End-To-End Speech Recognition with Recurrent Neural Networks , 2014, ICML.

[25]  Thomas Demeester,et al.  Representation learning for very short texts using weighted word embedding aggregation , 2016, Pattern Recognit. Lett..

[26]  Alexandros Karatzoglou,et al.  Session-based Recommendations with Recurrent Neural Networks , 2015, ICLR.

[27]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[28]  Christopher C. Johnson Logistic Matrix Factorization for Implicit Feedback Data , 2014 .

[29]  Cícero Nogueira dos Santos,et al.  Deep Convolutional Neural Networks for Sentiment Analysis of Short Texts , 2014, COLING.

[30]  Qiang Yang,et al.  One-Class Collaborative Filtering , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[31]  Yoshua Bengio,et al.  Learning to Understand Phrases by Embedding the Dictionary , 2015, TACL.

[32]  Yehuda Koren,et al.  Yahoo! music recommendations: modeling music ratings with temporal dynamics and item taxonomy , 2011, RecSys '11.

[33]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[34]  Thorsten Joachims,et al.  Taste Over Time: The Temporal Dynamics of User Preferences , 2013, ISMIR.

[35]  John Salvatier,et al.  Theano: A Python framework for fast computation of mathematical expressions , 2016, ArXiv.

[36]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[37]  Oren Barkan,et al.  ITEM2VEC: Neural item embedding for collaborative filtering , 2016, 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP).

[38]  Vaibhava Goel,et al.  Advances in Very Deep Convolutional Neural Networks for LVCSR , 2016, INTERSPEECH.

[39]  Lars Schmidt-Thieme,et al.  BPR: Bayesian Personalized Ranking from Implicit Feedback , 2009, UAI.