Learning to Tag using Noisy Labels

In order to organize and retrieve the ever growing collection of multimedia objects on the Web, many algorithms have been developed to automatically tag images, music and videos. One source of labeled data for training these algorithms are tags collected from the Web, via collaborative tagging websites (e.g., Flickr, Last.FM and YouTube) or crowdsourcing applications (e.g., human computation games and Mechanical Turk). A common approach is to use tags directly as labels for training algorithms in a supervised way. This approach is problematic, because the presence of synonyms and misspellings amongst the tags creates a label space that is overly fragmented, with a huge number of classes, many of which are sparse and semantically equivalent to one another. In this work, we investigate a method for training tagging algorithms using a reduced set of labels corresponding to topics derived from the tags. We show that our proposed method is comparable, in terms of annotation and retrieval performance, to the method of using tags directly as labels, while being more efficient to train (as there are fewer classes) and less wasteful (eliminating the need to discard tags that are associated with too few examples). We demonstrate our results using a dataset collected by a human computation game, called TagATune.

[1]  Tao Li,et al.  A comparative study on content-based music genre classification , 2003, SIGIR.

[2]  Daniel P. W. Ellis,et al.  Song-Level Features and Support Vector Machines for Music Classification , 2005, ISMIR.

[3]  Grigorios Tsoumakas,et al.  Multi-Label Classification of Music into Emotions , 2008, ISMIR.

[4]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[5]  Mark B. Sandler,et al.  A Semantic Space for Music Derived from Social Tags , 2007, ISMIR.

[6]  Edith Law,et al.  Input-agreement: a new mechanism for collecting data using human computation games , 2009, CHI.

[7]  Gert R. G. Lanckriet,et al.  A Game-Based Approach for Collecting Semantic Annotations of Music , 2007, ISMIR.

[8]  Ning Hu,et al.  Understanding Search Performance in Query-by-Humming Systems , 2004, ISMIR.

[9]  Xindong Wu,et al.  Eliminating Class Noise in Large Datasets , 2003, ICML.

[10]  Adam L. Berger,et al.  A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[11]  Thomas L. Griffiths,et al.  Probabilistic Topic Models , 2007 .

[12]  I. Csiszár Maxent, Mathematics, and Information Theory , 1996 .

[13]  David M. Blei,et al.  Supervised Topic Models , 2007, NIPS.

[14]  J. Lafferty,et al.  Mixed-membership models of scientific publications , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[15]  Daniel P. W. Ellis,et al.  Please Scroll down for Article Journal of New Music Research a Web-based Game for Collecting Music Metadata a Web-based Game for Collecting Music Metadata , 2022 .

[16]  Thomas Sikora,et al.  BeatBank ? An MPEG-7 Compliant Query by Tapping System , 2004 .

[17]  Gert R. G. Lanckriet,et al.  Towards musical query-by-semantic-description using the CAL500 data set , 2007, SIGIR.

[18]  Joan Serrà,et al.  Music Mood Representations from Social Tags , 2009, ISMIR.

[19]  Geoffrey E. Hinton,et al.  Zero-shot Learning with Semantic Output Codes , 2009, NIPS.

[20]  Paul Lamere,et al.  Social Tagging and Music Information Retrieval , 2008 .

[21]  J. Nocedal Updating Quasi-Newton Matrices With Limited Storage , 1980 .

[22]  Andrew McCallum,et al.  Using Maximum Entropy for Text Classification , 1999 .

[23]  Thierry Bertin-Mahieux,et al.  Autotagger: A Model for Predicting Social Tags from Acoustic Features on Large Music Databases , 2008 .

[24]  Perry R. Cook,et al.  Easy As CBA: A Simple Probabilistic Model for Tagging Music , 2009, ISMIR.

[25]  Naonori Ueda,et al.  Modeling Social Annotation Data with Content Relevance using a Topic Model , 2009, NIPS.

[26]  C. Elkan,et al.  Topic Models , 2008 .

[27]  Carla E. Brodley,et al.  Class Noise Mitigation Through Instance Weighting , 2007, ECML.

[28]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Analysis , 1999, UAI.

[29]  Douglas Eck,et al.  Predicting genre labels for artist using FreeDB , 2006, ISMIR.

[30]  Masataka Goto,et al.  Recent studies on music information processing , 2004 .

[31]  Carla E. Brodley,et al.  Identifying Mislabeled Training Data , 1999, J. Artif. Intell. Res..

[32]  Andrew McCallum,et al.  Topic Models Conditioned on Arbitrary Features with Dirichlet-multinomial Regression , 2008, UAI.

[33]  Lawrence K. Saul,et al.  10 th International Society for Music Information Retrieval Conference ( ISMIR 2009 ) A PROBABILISTIC TOPIC MODEL FOR UNSUPERVISED LEARNING OF MUSICAL KEY-PROFILES , 2009 .

[34]  Andrew McCallum,et al.  Efficient methods for topic model inference on streaming document collections , 2009, KDD.

[35]  Laura A. Dabbish,et al.  Labeling images with a computer game , 2004, AAAI Spring Symposium: Knowledge Collection from Volunteer Contributors.

[36]  Paris Smaragdis,et al.  Combining Musical and Cultural Features for Intelligent Style Detection , 2002, ISMIR.

[37]  Michael I. Mandel,et al.  Evaluation of Algorithms Using Games: The Case of Music Tagging , 2009, ISMIR.

[38]  Wolfgang Nejdl,et al.  Improving music genre classification using collaborative tagging data , 2009, WSDM '09.

[39]  Geoffrey J. Gordon,et al.  Relational learning via collective matrix factorization , 2008, KDD.

[40]  Gert R. G. Lanckriet,et al.  Semantic Annotation and Retrieval of Music and Sound Effects , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[41]  Geoffrey J. Gordon,et al.  The support vector decomposition machine , 2006, ICML.

[42]  George Tzanetakis,et al.  Musical genre classification of audio signals , 2002, IEEE Trans. Speech Audio Process..

[43]  Michael I. Mandel LABROSA ’ S AUDIO CLASSIFICATION SUBMISSIONS , 2008 .