Automatic Music Tagging With Time Series Models

Author(s): Coviello, Emanuele | Abstract: As music distribution has evolved form physical media to digital content, tens of millions of songs are instantly available to consumers through several online services. In order to help users search, browse and discover songs from these extensive collections, music information retrieval systems have been developed to assist in automatically analyzing, indexing and recommending musical content. This dissertation proposes machine learning methods for content -based automatic tagging of music, and evaluates their performance on music annotation and retrieval tasks. The proposed methods rely on time-series models of the musical signal, to account for longer term temporal dynamics of music in addition to timbral textures, and allow to leverage different types of models and information at multiple time scales in a single system. Efficient algorithms for estimation and deployment are proposed for all the considered methods

[1]  Stefano Soatto,et al.  Dynamic Textures , 2003, International Journal of Computer Vision.

[2]  Antoni B. Chan,et al.  Clustering dynamic textures with the hierarchical EM algorithm , 2013, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[3]  Antoni B. Chan,et al.  Time Series Models for Semantic Music Annotation , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  A.H. Haddad,et al.  Applied optimal estimation , 1976, Proceedings of the IEEE.

[5]  Daniel P. W. Ellis,et al.  Multiple-Instance Learning for Music Information Retrieval , 2008, ISMIR.

[6]  Chin-Hui Lee,et al.  A Study on Music Genre Classification Based on Universal Acoustic Models , 2006, ISMIR.

[7]  Gustavo Carneiro,et al.  Supervised Learning of Semantic Classes for Image Annotation and Retrieval , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Gert R. G. Lanckriet,et al.  Semantic Annotation and Retrieval of Music and Sound Effects , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  Antoni B. Chan,et al.  Modeling Music as a Dynamic Texture , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  George Tzanetakis,et al.  Improving automatic music tag annotation using stacked generalization of probabilistic SVM outputs , 2009, ACM Multimedia.

[11]  Gert R. G. Lanckriet,et al.  Towards musical query-by-semantic-description using the CAL500 data set , 2007, SIGIR.

[12]  Thierry Bertin-Mahieux,et al.  Automatic Generation of Social Tags for Music Recommendation , 2007, NIPS.

[13]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[14]  Perry R. Cook,et al.  Easy As CBA: A Simple Probabilistic Model for Tagging Music , 2009, ISMIR.

[15]  Nuno Vasconcelos,et al.  Modeling, Clustering, and Segmenting Video with Mixtures of Dynamic Textures , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Nuno Vasconcelos,et al.  Learning Mixture Hierarchies , 1998, NIPS.