Non-Linear Semantic Embedding for Organizing Large Instrument Sample Libraries

Though tags and metadata may provide rich indicators of relationships between high-level concepts like songs, artists or even genres, verbal descriptors lack the fine-grained detail necessary to capture acoustic nuances necessary for efficient retrieval of sounds in extremely large sample libraries. To these ends, we present a flexible approach titled Non-linear Semantic Embedding (NLSE), capable of projecting high-dimensional time-frequency representations of musical instrument samples into a low-dimensional, semantically-organized metric space. As opposed to other dimensionality reduction techniques, NLSE incorporates extrinsic semantic information in learning a projection, automatically learns salient acoustic features, and generates an intuitively meaningful output space.

[1]  Xavier Rodet,et al.  MUSICAL INSTRUMENT IDENTIFICATION IN CONTINUOUS RECORDINGS , 2004 .

[2]  J. Berger,et al.  The thirteen colors of timbre , 2005, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005..

[3]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[4]  Malcolm Slaney,et al.  Web-Scale Multimedia Analysis: Does Content Matter? , 2011, IEEE MultiMedia.

[5]  Niko Wilbert,et al.  Modular Toolkit for Data Processing (MDP): A Python Data Processing Framework , 2008, Frontiers Neuroinformatics.

[6]  Yann LeCun,et al.  Dimensionality Reduction by Learning an Invariant Mapping , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[7]  Gaël Richard,et al.  Musical instrument recognition by pairwise classification strategies , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Bob L. Sturm,et al.  Musical instrument identification using multiscale Mel-frequency cepstral coefficients , 2010, 2010 18th European Signal Processing Conference.

[9]  Judith C. Brown,et al.  An efficient algorithm for the calculation of a constant Q transform , 1992 .

[10]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.