Pattern classification formulated as a missing data task: The audio genre classification case

This paper presents pattern classification to a predefined set of classes as a missing data task. This is achieved by first augmenting the feature vector of each training pattern with the corresponding binary codeword representing its class. A Restricted Boltzmann Machine (RBM) or a Dictionary Learning (DL) algorithm is then trained on the augmented feature space. During the classification stage, the binary codeword of the unknown pattern is treated as missing data. In the case of the RBM, it is filled in by means of an alternating Gibbs sampling procedure. In the case of the DL method, the set of atoms in the dictionary is first learned from the training data, and the label of the unknown pattern is predicted based on those atoms that represent this pattern. Application of the method in an audio genre classification task verifies that the obtained results are highly competitive compared with state-of-the-art methods. Moreover, the DL approach lends itself readily for online implementations, in line with the current trend in big data applications.

[1]  Søren Holdt Jensen,et al.  A tempo-insensitive representation of rhythmic patterns , 2009, 2009 17th European Signal Processing Conference.

[2]  Sergios Theodoridis,et al.  Sparsity-Aware Learning and Compressed Sensing: An Overview , 2012, ArXiv.

[3]  Gerhard Widmer,et al.  Evaluating Rhythmic descriptors for Musical Genre Classification , 2004 .

[4]  Kjersti Engan,et al.  Recursive Least Squares Dictionary Learning Algorithm , 2010, IEEE Transactions on Signal Processing.

[5]  B. Schölkopf,et al.  Modeling Human Motion Using Binary Latent Variables , 2007 .

[6]  Paris Smaragdis,et al.  Missing data imputation for spectral audio signals , 2009, 2009 IEEE International Workshop on Machine Learning for Signal Processing.

[7]  George Tzanetakis,et al.  An experimental comparison of audio tempo induction algorithms , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Javier R. Movellan,et al.  Diffusion Networks, Products of Experts, and Factor Analysis , 2001 .

[9]  A. Bruckstein,et al.  K-SVD : An Algorithm for Designing of Overcomplete Dictionaries for Sparse Representation , 2005 .

[10]  Geoffrey E. Hinton,et al.  Restricted Boltzmann machines for collaborative filtering , 2007, ICML '07.

[11]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[12]  Geoffrey E. Hinton,et al.  To recognize shapes, first learn to generate images. , 2007, Progress in brain research.

[13]  Markus Schedl,et al.  From Improved Auto-Taggers to Improved Music Similarity Measures , 2012, Adaptive Multimedia Retrieval.

[14]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[15]  Yann LeCun,et al.  Unsupervised Learning of Sparse Features for Scalable Audio Classification , 2011, ISMIR.

[16]  David Haussler,et al.  Unsupervised learning of distributions on binary vectors using two layer networks , 1991, NIPS 1991.

[17]  Peter Knees,et al.  On Rhythm and General Music Similarity , 2009, ISMIR.

[18]  Sergios Theodoridis,et al.  Machine Learning: A Bayesian and Optimization Perspective , 2015 .

[19]  Gerhard Widmer,et al.  Probabilistic Combination of Features for Music Classification , 2006, ISMIR.

[20]  George Tzanetakis,et al.  Audio genre classification using percussive pattern clustering combined with timbral features , 2009, 2009 IEEE International Conference on Multimedia and Expo.

[21]  M. Elad,et al.  $rm K$-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation , 2006, IEEE Transactions on Signal Processing.

[22]  Christian Osendorfer,et al.  Music Similarity Estimation with the Mean-Covariance Restricted Boltzmann Machine , 2011, 2011 10th International Conference on Machine Learning and Applications and Workshops.

[23]  Yannis Stylianou,et al.  Rhythmic similarity of music based on dynamic periodicity warping , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[24]  Klaus Seyerlehner FUSING BLOCK-LEVEL FEATURES FOR MUSIC SIMILARITY ESTIMATION , 2010 .

[25]  E Tsunoo,et al.  Beyond Timbral Statistics: Improving Music Classification Using Percussive Patterns and Bass Lines , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[26]  Yannis Stylianou,et al.  A scale transform based method for rhythmic similarity of music , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[27]  Guillermo Sapiro,et al.  Online Learning for Matrix Factorization and Sparse Coding , 2009, J. Mach. Learn. Res..

[28]  Gautham J. Mysore,et al.  Audio Imputation Using the Non-negative Hidden Markov Model , 2012, LVA/ICA.

[29]  Javier R. Movellan,et al.  DIFFUSION NETWORKS , PRODUCT OF EXPERTS , AND FACTOR ANALYSIS , 2001 .

[30]  Michael Elad,et al.  Sparse and Redundant Representations - From Theory to Applications in Signal and Image Processing , 2010 .