Large-scale music tag recommendation with explicit multiple attributes

Social tagging can provide rich semantic information for large-scale retrieval in music discovery. Such collaborative intelligence, however, also generates a high degree of tags unhelpful to discovery, some of which obfuscate critical information. Towards addressing these shortcomings, tag recommendation for more robust music discovery is an emerging topic of significance for researchers. However, current methods do not consider diversity of music attributes, often using simple heuristics such as tag frequency for filtering out irrelevant tags. Music attributes encompass any number of perceived dimensions, for instance vocalness, genre, and instrumentation. Many of these are underrepresented by current tag recommenders. We propose a scheme for tag recommendation using Explicit Multiple Attributes based on tag semantic similarity and music content. In our approach, the attribute space is explicitly constrained at the outset to a set that minimizes semantic loss and tag noise, while ensuring attribute diversity. Once the user uploads or browses a song, the system recommends a list of relevant tags in each attribute independently. To the best of our knowledge, this is the first method to consider Explicit Multiple Attributes for tag recommendation. Our system is designed for large-scale deployment, on the order of millions of objects. For processing large-scale music data sets, we design parallel algorithms based on the MapReduce framework to perform large-scale music content and social tag analysis, train a model, and compute tag similarity. We evaluate our tag recommendation system on CAL-500 and a large-scale data set ($N = 77,448$ songs) generated by crawling Youtube and Last.fm. Our results indicate that our proposed method is both effective for recommending attribute-diverse relevant tags and efficient at scalable processing.

[1]  Dong Liu,et al.  Tag ranking , 2009, WWW '09.

[2]  Alexandr Andoni,et al.  Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[3]  Paul M. B. Vitányi,et al.  The Google Similarity Distance , 2004, IEEE Transactions on Knowledge and Data Engineering.

[4]  Thomas Hofmann,et al.  Map-Reduce for Machine Learning on Multicore , 2007 .

[5]  Roelof van Zwol,et al.  Flickr tag recommendation based on collective knowledge , 2008, WWW.

[6]  James Ze Wang,et al.  Real-Time Computerized Annotation of Pictures , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Edward Y. Chang,et al.  Effective image annotation via active learning , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.

[8]  Nenghai Yu,et al.  WWW 2009 MADRID! Track: Rich Media / Session: Tagging and Clustering Learning to , 2022 .

[9]  Jimmy J. Lin Brute force and indexed approaches to pairwise document similarity comparisons with MapReduce , 2009, SIGIR.

[10]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[11]  Daniel Gatica-Perez,et al.  On image auto-annotation with latent space models , 2003, ACM Multimedia.

[12]  S. Griffis EDITOR , 1997, Journal of Navigation.

[13]  Changhu Wang,et al.  Learning to reduce the semantic gap in web image retrieval and annotation , 2008, SIGIR '08.

[14]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[15]  Jimmy J. Lin Scalable Language Processing Algorithms for the Masses: A Case Study in Computing Word Co-occurrence Matrices with MapReduce , 2008, EMNLP.

[16]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[17]  Communications Computer Systems 2000 ieee international conference on multimedia and expo - ICME 2000 , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[18]  Wei-Ying Ma,et al.  Annotating Images by Mining Image Search Results , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Ja-Ling Wu,et al.  SheepDog: group and tag recommendation for flickr photos by automatic search-based learning , 2008, ACM Multimedia.

[20]  Chin-Hui Lee,et al.  Enhancing image annotation by integrating concept ontology and text-based bayesian learning model , 2007, ACM Multimedia.

[21]  Gert R. G. Lanckriet,et al.  Semantic Annotation and Retrieval of Music and Sound Effects , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[22]  Gert R. G. Lanckriet,et al.  Towards musical query-by-semantic-description using the CAL500 data set , 2007, SIGIR.

[23]  Bingjun Zhang,et al.  Comprehensive query-dependent fusion using regression-on-folksonomies: a case study of multimodal music search , 2009, ACM Multimedia.

[24]  George Tzanetakis,et al.  Improving automatic music tag annotation using stacked generalization of probabilistic SVM outputs , 2009, ACM Multimedia.

[25]  Bingjun Zhang,et al.  CompositeMap: a novel framework for music similarity measure , 2009, SIGIR.

[26]  Craig MacDonald,et al.  Comparing Distributed Indexing: To MapReduce or Not? , 2009, LSDS-IR@SIGIR.

[27]  Perry R. Cook,et al.  Easy As CBA: A Simple Probabilistic Model for Tagging Music , 2009, ISMIR.

[28]  George Tzanetakis,et al.  MARSYAS: a framework for audio analysis , 1999, Organised Sound.

[29]  Thierry Bertin-Mahieux,et al.  Autotagger: A Model for Predicting Social Tags from Acoustic Features on Large Music Databases , 2008 .