Music retagging using label propagation and robust principal component analysis

The emergence of social tagging websites such as Last.fm has provided new opportunities for learning computational models that automatically tag music. Researchers typically obtain music tags from the Internet and use them to construct machine learning models. Nevertheless, such tags are usually noisy and sparse. In this paper, we present a preliminary study that aims at refining (retagging) social tags by exploiting the content similarity between tracks and the semantic redundancy of the track-tag matrix. The evaluated algorithms include a graph-based label propagation method that is often used in semi-supervised learning and a robust principal component analysis (PCA) algorithm that has led to state-of-the-art results in matrix completion. The results indicate that robust PCA with content similarity constraint is particularly effective; it improves the robustness of tagging against three types of synthetic errors and boosts the recall rate of music auto-tagging by 7% in a real-world setting.

[1]  Ohad Shamir,et al.  Large-Scale Convex Minimization with a Low-Rank Constraint , 2011, ICML.

[2]  Peter Knees,et al.  Exploring the music similarity space on the web , 2011, TOIS.

[3]  Edith Law,et al.  Input-agreement: a new mechanism for collecting data using human computation games , 2009, CHI.

[4]  Emmanuel J. Candès,et al.  A Singular Value Thresholding Algorithm for Matrix Completion , 2008, SIAM J. Optim..

[5]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[6]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[7]  Yi-Hsuan Yang,et al.  Ranking-Based Emotion Recognition for Music Organization and Retrieval , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Y. Nesterov A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .

[9]  Yi Ma,et al.  Robust principal component analysis? , 2009, JACM.

[10]  Arvind Ganesh,et al.  Fast Convex Optimization Algorithms for Exact Recovery of a Corrupted Low-Rank Matrix , 2009 .

[11]  Adi Shraibman,et al.  Rank, Trace-Norm and Max-Norm , 2005, COLT.

[12]  George Tzanetakis,et al.  Improving automatic music tag annotation using stacked generalization of probabilistic SVM outputs , 2009, ACM Multimedia.

[13]  Marcos Aurélio Domingues,et al.  Three Current Issues In Music Autotagging , 2011, ISMIR.

[14]  Peter Knees,et al.  Augmenting Text-based Music Retrieval with Audio Similarity: Advantages and Limitations , 2009, ISMIR.

[15]  Marc Leman,et al.  Content-Based Music Information Retrieval: Current Directions and Future Challenges , 2008, Proceedings of the IEEE.

[16]  Òscar Celma,et al.  MIREX 2011 AUDIO TAG CLASSIFICATION USING WEIGHTED-VOTE NEAREST NEIGHBOR CLASSIFICATION , 2011 .

[17]  Shuicheng Yan,et al.  Graph embedding: a general framework for dimensionality reduction , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[18]  Gert R. G. Lanckriet,et al.  Learning Multi-modal Similarity , 2010, J. Mach. Learn. Res..

[19]  G. Sapiro,et al.  A collaborative framework for 3D alignment and classification of heterogeneous subvolumes in cryo-electron tomography. , 2013, Journal of structural biology.

[20]  Christopher Ré,et al.  Parallel stochastic gradient algorithms for large-scale matrix completion , 2013, Mathematical Programming Computation.

[21]  Homer H. Chen,et al.  Music Emotion Recognition , 2011 .

[22]  Changhu Wang,et al.  Content-Based Image Annotation Refinement , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Nenghai Yu,et al.  Multi-graph similarity reinforcement for image annotation refinement , 2008, 2008 15th IEEE International Conference on Image Processing.

[24]  Thierry Bertin-Mahieux,et al.  Automatic Tagging of Audio: The State-of-the-Art , 2011 .

[25]  Edith Law The Problem of Accuracy as an Evaluation Criterion , 2008 .

[26]  Douglas Eck,et al.  Learning Tags that Vary Within a Song , 2010, ISMIR.

[27]  Youngmoo E. Kim,et al.  Exploring automatic music annotation with "acoustically-objective" tags , 2010, MIR '10.

[28]  Paul Lamere,et al.  Social Tagging and Music Information Retrieval , 2008 .

[29]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[30]  Hsin-Min Wang,et al.  Learning the Similarity of Audio Music in Bag-of-frames Representation from Tagged Music Data , 2011, ISMIR.

[31]  Doug Schuler,et al.  Social computing , 1994, CACM.

[32]  François Pachet,et al.  Signal + Context = Better Classification , 2007, ISMIR.

[33]  Jiebo Luo,et al.  Annotating photo collections by label propagation according to multiple similarity cues , 2008, ACM Multimedia.

[34]  R. Larsen Lanczos Bidiagonalization With Partial Reorthogonalization , 1998 .

[35]  Patricia L. Mabry,et al.  Advances in Social Computing, Third International Conference on Social Computing, Behavioral Modeling, and Prediction, SBP 2010, Bethesda, MD, USA, March 30-31, 2010. Proceedings , 2010, SBP.

[36]  Riccardo Miotto,et al.  Improving Auto-tagging by Modeling Semantic Co-occurrences , 2010, ISMIR.

[37]  Xavier Serra,et al.  Unifying Low-Level and High-Level Music Similarity Measures , 2011, IEEE Transactions on Multimedia.

[38]  Òscar Celma,et al.  Audio Tag Classification using Weighted-Vote Nearest Neighbor Classification , 2011 .

[39]  Gert R. G. Lanckriet,et al.  Five Approaches to Collecting Tags for Music , 2008, ISMIR.

[40]  Yi Ma,et al.  The Augmented Lagrange Multiplier Method for Exact Recovery of Corrupted Low-Rank Matrices , 2010, Journal of structural biology.

[41]  Shuicheng Yan,et al.  Graph Embedding and Extensions: A General Framework for Dimensionality Reduction , 2007 .

[42]  James D. Hollan,et al.  Distributed cognition: toward a new foundation for human-computer interaction research , 2000, TCHI.

[43]  Wenji Mao,et al.  Social Computing: From Social Informatics to Social Intelligence , 2007, IEEE Intell. Syst..

[44]  Gert R. G. Lanckriet,et al.  Towards musical query-by-semantic-description using the CAL500 data set , 2007, SIGIR.

[45]  Fabien Gouyon,et al.  A Method for Obtaining Semantic Facets of Music Tags , 2010, RecSys 2010.

[46]  François Pachet,et al.  The bag-of-frames approach to audio pattern recognition: a sufficient model for urban soundscapes but not for polyphonic music. , 2007, The Journal of the Acoustical Society of America.

[47]  Mark B. Sandler,et al.  Music Information Retrieval Using Social Tags and Audio , 2009, IEEE Transactions on Multimedia.

[48]  Dong Liu,et al.  Image retagging , 2010, ACM Multimedia.

[49]  Shuicheng Yan,et al.  Image tag refinement towards low-rank, content-tag prior and error sparsity , 2010, ACM Multimedia.

[50]  L. Steels Collaborative tagging as distributed cognition , 2006 .

[51]  Evgeniy Gabrilovich,et al.  Wikipedia-based Semantic Interpretation for Natural Language Processing , 2014, J. Artif. Intell. Res..

[52]  Gert R. G. Lanckriet,et al.  Combining audio content and social context for semantic music discovery , 2009, SIGIR.