Automatic Audio Tagging and Retrieval Using Semi-Surpervised Canonical Density Estimation

We apply SSCDE (semi-supervised canonical density estimation), a semi-supervised learning method based on topic modeling, to audio tagging and retrieval problems. SSCDE was originally proposed as an image annotaion and retireval method, but it can also be applied to audio data. The SSCDE method consists of two parts: 1) extraction of a low-dimentional latent space representing topics of sounds using a semi-supervised variant of canonical correlation analysis, and 2) learning a topic model using multi-class extention of semi-supervised kernel density estimation in the latent space. Audio tagging exrperiments with real-world data indicate that SSCDE improves the annotation accuracy even when only a small number of tagged sounds are available.