Drumloop Separation using adaptive Spectrogram Templates

The separation of drumsounds from drumloops is a desirable signal processing functionality with a wide variety of applications in music production and music video games. Here, drumsounds describes the sound that is audible, when a drum or percussion instrument is hit. Since recognition of the involved drums is a prerequisite for separation, the detection of instrument types and corresponding onsets is necessary. Although machine-learning based classification of isolated drumsounds has been proven do be feasible, it is not directly applicable to the problem of drumloop separation. The main challenge is the strong overlap of drum spectra when two or more drumsounds share the same onset. It leads to erroneous estimation of the involved instruments, e.g., a tom and a hi-hat appearing simultaneously could easily be misclassified as being a snare. Different approaches have been proposed in the literature to overcome that problem, mainly template matching [1],[2] vs. decomposition based methods [3],[4],[5]. We pursue the approach of template matching, but combine it with a Non-Negative Matrix Factorization (NMF) in order to derive an initial estimate of the spectrogram templates of the involved drumsounds. A heuristic update rule for the templates is described as well as an expectation-maximization approach to the quasitranscription. Due to space limits in this publication, a formal evaluation is omitted.