High level feature extraction for the self-taught learning algorithm

Availability of large amounts of raw unlabeled data has sparked the recent surge in semi-supervised learning research. In most works, however, it is assumed that labeled and unlabeled data come from the same distribution. This restriction is removed in the self-taught learning algorithm where unlabeled data can be different, but nevertheless have similar structure. First, a representation is learned from the unlabeled samples by decomposing their data matrix into two matrices called bases and activations matrix respectively. This procedure is justified by the assumption that each sample is a linear combination of the columns in the bases matrix which can be viewed as high level features representing the knowledge learned from the unlabeled data in an unsupervised way. Next, activations of the labeled data are obtained using the bases which are kept fixed. Finally, a classifier is built using these activations instead of the original labeled data. In this work, we investigated the performance of three popular methods for matrix decomposition: Principal Component Analysis (PCA), Non-negative Matrix Factorization (NMF) and Sparse Coding (SC) as unsupervised high level feature extractors for the self-taught learning algorithm. We implemented this algorithm for the music genre classification task using two different databases: one as unlabeled data pool and the other as data for supervised classifier training. Music pieces come from 10 and 6 genres for each database respectively, while only one genre is common for the both of them. Results from wide variety of experimental settings show that the self-taught learning method improves the classification rate when the amount of labeled data is small and, more interestingly, that consistent improvement can be achieved for a wide range of unlabeled data sizes. The best performance among the matrix decomposition approaches was shown by the Sparse Coding method.

[1]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[2]  Christian Schörkhuber CONSTANT-Q TRANSFORM TOOLBOX FOR MUSIC PROCESSING , 2010 .

[3]  Qiang Yang,et al.  Self-taught clustering , 2008, ICML '08.

[4]  Tuomas Virtanen,et al.  Monaural Sound Source Separation by Nonnegative Matrix Factorization With Temporal Continuity and Sparseness Criteria , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  Changshui Zhang,et al.  Content-Based Information Fusion for Semi-Supervised Music Genre Classification , 2008, IEEE Transactions on Multimedia.

[6]  Zi Huang,et al.  Self-taught dimensionality reduction on the high-dimensional small-sized data , 2013, Pattern Recognit..

[7]  Shiming Xiang,et al.  Semi-Supervised Music Genre Classification , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[8]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[9]  P. Smaragdis,et al.  Non-negative matrix factorization for polyphonic music transcription , 2003, 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (IEEE Cat. No.03TH8684).

[10]  Tomoko Matsui,et al.  Nonnegative matrix factorization based self-taught learning with application to music genre classification , 2012, 2012 IEEE International Workshop on Machine Learning for Signal Processing.

[11]  Andrzej Cichocki,et al.  Nonnegative Matrix and Tensor Factorization T , 2007 .

[12]  Amnon Shashua,et al.  Nonnegative Sparse PCA , 2006, NIPS.

[13]  Nancy Bertin,et al.  Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis , 2009, Neural Computation.

[14]  George Tzanetakis,et al.  Musical genre classification of audio signals , 2002, IEEE Trans. Speech Audio Process..

[15]  R. Tibshirani,et al.  Sparse Principal Component Analysis , 2006 .

[16]  Xavier Serra,et al.  ISMIR 2004 Audio Description Contest , 2006 .

[17]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[18]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[19]  Rémi Gribonval,et al.  Sparse Representations in Audio and Music: From Coding to Source Separation , 2010, Proceedings of the IEEE.

[20]  Tomoko Matsui,et al.  Music genre classification using self-taught learning via sparse coding , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[21]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[22]  Rajat Raina,et al.  Efficient sparse coding algorithms , 2006, NIPS.

[23]  David G. Stork,et al.  Pattern Classification , 1973 .

[24]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[25]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[26]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[27]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[28]  Hirokazu Kameoka,et al.  Automatic audio tag classification via semi-supervised canonical density estimation , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[29]  Rajat Raina,et al.  Self-taught learning: transfer learning from unlabeled data , 2007, ICML '07.

[30]  Yann LeCun,et al.  Unsupervised Learning of Sparse Features for Scalable Audio Classification , 2011, ISMIR.

[31]  James Theiler,et al.  Online Feature Selection using Grafting , 2003, ICML.