论文信息 - Player Vs Transcriber: A Game Approach To Data Manipulation For Automatic Drum Transcription

Player Vs Transcriber: A Game Approach To Data Manipulation For Automatic Drum Transcription

State-of-the-art automatic drum transcription (ADT) approaches utilise deep learning methods reliant on timeconsuming manual annotations and require congruence between training and testing data. When these conditions are not held, they often fail to generalise. We propose a game approach to ADT, termed player vs transcriber (PvT), in which a player model aims to reduce transcription accuracy of a transcriber model by manipulating training data in two ways. First, existing data may be augmented, allowing the transcriber to be trained using recordings with modified timbres. Second, additional individual recordings from sample libraries are included to generate rare combinations. We present three versions of the PvT model: AugExist, which augments pre-existing recordings; AugAddExist, which adds additional samples of drum hits to the AugExist system; and Generate, which generates training examples exclusively from individual drum hits from sample libraries. The three versions are evaluated alongside a state-of-the-art deep learning ADT system using two evaluation strategies. The results demonstrate that including the player network improves the ADT performance and suggests that this is due to improved generalisability. The results also indicate that although the Generate model achieves relatively low results, it is a viable choice when annotations are not accessible.

Jason Hockman | Ryan Stables | Carl Southall

[1] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[2] Juan Pablo Bello,et al. A Software Framework for Musical Data Augmentation , 2015, ISMIR.

[3] Gaël Richard,et al. ENST-Drums: an extensive audio-visual database for drum signals processing , 2006, ISMIR.

[4] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[5] Peter Knees,et al. Recurrent Neural Networks for Drum Transcription , 2016, ISMIR.

[6] Gaël Richard,et al. Transcription and Separation of Drum Signals From Polyphonic Music , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[7] Gerhard Widmer,et al. A Review of Automatic Drum Transcription , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[8] Henry Lindsay-Smith. DRUMKIT TRANSCRIPTION VIA CONVOLUTIVE NMF , 2012 .

[9] Jason Hockman,et al. ADTWeb: An open-source browser based automatic drum transcription system , 2017 .

[10] Axel Röbel,et al. On automatic drum transcription using non-negative matrix deconvolution and itakura saito divergence , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11] Matthew E. P. Davies,et al. An open-source drum transcription system for Pure Data and Max MSP , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[12] Jason Hockman,et al. Automatic Drum Transcription Using Bi-Directional Recurrent Neural Networks , 2016, ISMIR.

[13] Daniel Gärtner,et al. Real-Time Transcription and Separation of Drum Recordings Based on NMF Decomposition , 2014, DAFx.

[14] Mathieu Lagrange,et al. Alternate level clustering for drum transcription , 2015, 2015 23rd European Signal Processing Conference (EUSIPCO).

[15] Alexander Lerch,et al. Automatic Drum Transcription Using the Student-Teacher Learning Paradigm with Unlabeled Music Data , 2017, ISMIR.

[16] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[17] Alexander Lerch,et al. Drum transcription using partially fixed non-negative matrix factorization , 2015, 2015 23rd European Signal Processing Conference (EUSIPCO).

[18] Alexander Lerch,et al. MDB Drums: An annotated subset of MedleyDB for automatic drum transcription , 2017 .

[19] Peter Knees,et al. Drum transcription from polyphonic music with recurrent neural networks , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20] Florian Krebs,et al. madmom: A New Python Audio and Music Signal Processing Library , 2016, ACM Multimedia.

[21] Jason Hockman,et al. Automatic Drum Transcription for Polyphonic Recordings Using Soft Attention Mechanisms and Convolutional Neural Networks , 2017, ISMIR.

[22] Simon Dixon,et al. Drum Transcription via Classification of Bar-Level Rhythmic Patterns , 2014, ISMIR.

[23] Peter Knees,et al. Drum Transcription via Joint Beat and Drum Modeling Using Convolutional Recurrent Neural Networks , 2017, ISMIR.