Multi-Task Learning of Graph-based Inductive Representations of Music Content

Music streaming platforms rely heavily on learning meaningful representations of tracks to surface apt recommendations to users in a number of different use cases. In this work, we consider the task of learning music track representations by leveraging three rich heterogeneous sources of information: (i) organizational information (e.g., playlist co-occurrence), (ii) content information (e.g., audio and acoustics), and (iii) music stylistics (e.g., genre). We advocate for a multi-task formulation of graph representation learning, and propose MUSIG: MUlti-task Sampling and Inductive learning on Graphs. MUSIG allows us to derive generalized track representations that combine the benefits offered by (i) the inductive graph based framework, which generates embeddings by sampling and aggregating features from a node’s local neighborhood, as well as, (ii) multi-task training of aggregation functions, which ensures the learnt functions perform well on a number of important tasks. We present large scale empirical results for track recommendation for the playlist completion task, and compare different classes of representation learning approaches, including collaborative filtering, word2vec and node embeddings, as well as graph embedding approaches. Our results demonstrate that considering content information (i.e., audio and acoustic features) is useful and that multi-task supervision helps learn better representations.

[1]  Evangelia Christakopoulou,et al.  Local Item-Item Models For Top-N Recommendation , 2016, RecSys.

[2]  Yanfang Ye,et al.  Heterogeneous Graph Attention Network , 2019, WWW.

[3]  Jure Leskovec,et al.  PinnerSage: Multi-Modal User Embedding Framework for Recommendations at Pinterest , 2020, KDD.

[4]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[5]  Wen Jiang,et al.  Dynamic Heterogeneous Graph Embedding Using Hierarchical Attentions , 2020, ECIR.

[6]  Mohan S. Kankanhalli,et al.  Exploiting Music Play Sequence for Music Recommendation , 2017, IJCAI.

[7]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[8]  Yi Tay,et al.  Deep Learning based Recommender System: A Survey and New Perspectives , 2018 .

[9]  Kai Chen,et al.  Collaborative filtering and deep learning based recommendation system for cold start items , 2017, Expert Syst. Appl..

[10]  Jun Zhao,et al.  IntentGC: A Scalable Graph Convolution Framework Fusing Heterogeneous Information for Recommendation , 2019, KDD.

[11]  Hansheng Xue,et al.  Multiplex Bipartite Network Embedding using Dual Hypergraph Convolutional Networks , 2021, WWW.

[12]  Juhan Nam,et al.  Representation Learning of Music Using Artist Labels , 2018, ISMIR.

[13]  Jure Leskovec,et al.  Graph Convolutional Neural Networks for Web-Scale Recommender Systems , 2018, KDD.

[14]  Liang Tang,et al.  Personalized Recommendation via Parameter-Free Contextual Bandits , 2015, SIGIR.

[15]  El Habib Nfaoui,et al.  Using Tweets Embeddings For Hashtag Recommendation in Twitter , 2018 .

[16]  Rahul Katarya,et al.  Efficient music recommender system using context graph and particle swarm , 2017, Multimedia Tools and Applications.

[17]  Jason Weston,et al.  WSABIE: Scaling Up to Large Vocabulary Image Annotation , 2011, IJCAI.

[18]  Gerhard Widmer,et al.  A Predictive Model for Music based on Learned Interval Representations , 2018, ISMIR.

[19]  Nitesh V. Chawla,et al.  metapath2vec: Scalable Representation Learning for Heterogeneous Networks , 2017, KDD.

[20]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[21]  Nemanja Djuric,et al.  E-commerce in Your Inbox: Product Recommendations at Scale , 2015, KDD.

[22]  Jure Leskovec,et al.  Inductive Representation Learning on Large Graphs , 2017, NIPS.

[23]  Li Su,et al.  Learning Domain-Adaptive Latent Representations of Music Signals Using Variational Autoencoders , 2018, ISMIR.

[24]  Gerhard Widmer,et al.  Learning Transposition-Invariant Interval Features from Symbolic Music and Audio , 2018, ArXiv.

[25]  Filip Radlinski,et al.  Towards Conversational Recommender Systems , 2016, KDD.

[26]  Nick Craswell,et al.  Query Expansion with Locally-Trained Word Embeddings , 2016, ACL.

[27]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[28]  Thierry Bertin-Mahieux,et al.  The Million Song Dataset , 2011, ISMIR.

[29]  Hugo Caselles-Dupré,et al.  Word2vec applied to recommendation: hyperparameters matter , 2018, RecSys.

[30]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[31]  Peter Bruza,et al.  Inferring query models by computing information flow , 2002, CIKM '02.

[32]  Jure Leskovec,et al.  Representation Learning on Graphs: Methods and Applications , 2017, IEEE Data Eng. Bull..

[33]  Yongliang Li,et al.  Metapath-guided Heterogeneous Graph Neural Network for Intent Recommendation , 2019, KDD.

[34]  Andreas Krause,et al.  Explore-exploit in top-N recommender systems via Gaussian processes , 2014, RecSys '14.

[35]  Tillman Weyde,et al.  Deep Neural Networks with Voice Entry Estimation Heuristics for Voice Separation in Symbolic Music Representations , 2018, ISMIR.