Automatic audio tag classification via semi-supervised canonical density estimation

We propose a novel semi-supervised method for building a statistical model that represents the relationship between sounds and text labels (“tags”). The proposed method, named semi-supervised canonical density estimation, makes use of unlabeled sound data in two ways: 1) a low-dimensional latent space representing topics of sounds is extracted by a semi-supervised variant of canonical correlation analysis, and 2) topic models are learned by multi-class extension of semi-supervised kernel density estimation in the topic space. Real-world audio tagging experiments indicate that our proposed method improves the accuracy even when only a small number of labeled sounds are available.

[1]  Malcolm Slaney,et al.  Semantic-audio retrieval , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Gert R. G. Lanckriet,et al.  Combining Feature Kernels for Semantic Music Retrieval , 2008, ISMIR.

[3]  Yasuo Kuniyoshi,et al.  Image annotation and retrieval based on efficient learning of contextual latent space , 2009, 2009 IEEE International Conference on Multimedia and Expo.

[4]  Gert R. G. Lanckriet,et al.  Identifying Words that are Musically Meaningful , 2007, ISMIR.

[5]  Gert R. G. Lanckriet,et al.  Semantic Annotation and Retrieval of Music and Sound Effects , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[7]  Meng Wang,et al.  Semi-supervised kernel density estimation for video annotation , 2009, Comput. Vis. Image Underst..

[8]  Sugiyama Masashi,et al.  SemiCCA: Efficient semi-supervised learning of canonical correlations , 2012 .

[9]  Peter Knees,et al.  Artist Classification with Web-Based Data , 2004, ISMIR.

[10]  Gert R. G. Lanckriet,et al.  Combining audio content and social context for semantic music discovery , 2009, SIGIR.

[11]  William W. Cohen,et al.  Web-collaborative filtering: recommending music by crawling the Web , 2000, Comput. Networks.

[12]  Hirokazu Kameoka,et al.  SemiCCA: Efficient Semi-supervised Learning of Canonical Correlations , 2010, 2010 20th International Conference on Pattern Recognition.

[13]  Kazuya Takeda,et al.  Building and combining document and music spaces for music query-by-webpage system , 2008, INTERSPEECH.

[14]  Peter Knees,et al.  A music search engine built upon audio-based and web-based similarity measures , 2007, SIGIR.

[15]  Michael I. Jordan,et al.  A Probabilistic Interpretation of Canonical Correlation Analysis , 2005 .

[16]  Daniel P. W. Ellis,et al.  Automatic Record Reviews , 2004, ISMIR.

[17]  David A. Forsyth,et al.  Matching Words and Pictures , 2003, J. Mach. Learn. Res..