Self-Learning Acoustic Feature Generation and Selection for the Discrimination of Musical Signals

Optimal features for the discrimination of musical signals are largely discussed. Herein we therefore present a selflearning approach to this problem based on time series analysis and evolutionary feature-space optimization. The feature basis is formed by a multiplicity of dynamic acoustic Low-Level-Descriptors as pitch, intensity, and spectral information. These are filtered and pre-processed with special respect to human perception. From hereon a systematic derivation of further contours and functionals by means of descriptive statistics takes place. The resulting high-dimensional space of static features is then optimized by combined sequential floating and genetic search. As learning function we apply Support Vector Machines, known for their high performance within this task. In order to allow more flexibility we integrate alteration and combination of attributes by mathematical operations. Applicability of the proposed approach is demonstrated by extensive test-runs on large public databases of musical signals containing among others segments of drumbeats, a-cappella singing, or multiinstrumental phrases. Outstanding performances can be reported for the discrimination of the signal type out of a stream.

[1]  Ingo Mierswa Automatic Feature Extraction from Large Time Series , 2004, LWA.

[2]  Malcolm Slaney,et al.  Construction and evaluation of a robust multifeature speech/music discriminator , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Björn W. Schuller,et al.  Feature Selection and Stacking for Robust Discrimination of Speech, Monophonic Singing, and Polyphonic Music , 2005, 2005 IEEE International Conference on Multimedia and Expo.