Nonnegative matrix factorization based self-taught learning with application to music genre classification

Availability of large amounts of raw unlabeled data has sparked the recent surge in semi-supervised learning research. In most works, however, it is assumed that labeled and unlabeled data come from the same distribution. This restriction is removed in the self-taught learning approach where unlabeled data can be different, but nevertheless have similar structure. First, a representation is learned from the unlabeled data via non-negative matrix factorization (NMF) and then it is applied to the labeled data used for classification. In this work, we implemented this method for the music genre classification task using two different databases: one as unlabeled data pool and the other for supervised classifier training. Music pieces come from 10 and 6 genres for each database respectively, while only one genre is common for both of them. Results from wide variety of experimental settings show that the self-taught learning method improves the classification rate when the amount of labeled data is small and, more interestingly, that consistent improvement can be achieved for a wide range of unlabeled data sizes.

[1]  Remco C. Veltkamp,et al.  A Survey of Music Information Retrieval Systems , 2005, ISMIR.

[2]  Rajat Raina,et al.  Self-taught learning: transfer learning from unlabeled data , 2007, ICML '07.

[3]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[4]  Rémi Gribonval,et al.  Sparse Representations in Audio and Music: From Coding to Source Separation , 2010, Proceedings of the IEEE.

[5]  Xavier Serra,et al.  ISMIR 2004 Audio Description Contest , 2006 .

[6]  P. Smaragdis,et al.  Non-negative matrix factorization for polyphonic music transcription , 2003, 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (IEEE Cat. No.03TH8684).

[7]  Tomoko Matsui,et al.  Music genre classification using self-taught learning via sparse coding , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[9]  Nancy Bertin,et al.  Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis , 2009, Neural Computation.

[10]  George Tzanetakis,et al.  Musical genre classification of audio signals , 2002, IEEE Trans. Speech Audio Process..

[11]  Yann LeCun,et al.  Unsupervised Learning of Sparse Features for Scalable Audio Classification , 2011, ISMIR.

[12]  Christian Schörkhuber CONSTANT-Q TRANSFORM TOOLBOX FOR MUSIC PROCESSING , 2010 .

[13]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[14]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[15]  Marc Leman,et al.  Content-Based Music Information Retrieval: Current Directions and Future Challenges , 2008, Proceedings of the IEEE.

[16]  Tuomas Virtanen,et al.  Monaural Sound Source Separation by Nonnegative Matrix Factorization With Temporal Continuity and Sparseness Criteria , 2007, IEEE Transactions on Audio, Speech, and Language Processing.