Deep Unsupervised Drum Transcription

We introduce DrummerNet, a drum transcription system that is trained in an unsupervised manner. DrummerNet does not require any ground-truth transcription and, with the data-scalability of deep neural networks, learns from a large unlabeled dataset. In DrummerNet, the target drum signal is first passed to a (trainable) transcriber, then reconstructed in a (fixed) synthesizer according to the transcription estimate. By training the system to minimize the distance between the input and the output audio signals, the transcriber learns to transcribe without ground truth transcription. Our experiment shows that DrummerNet performs favorably compared to many other recent drum transcription systems, both supervised and unsupervised.

[1]  Karen Simonyan,et al.  Neural Audio Synthesis of Musical Notes with WaveNet Autoencoders , 2017, ICML.

[2]  Gaël Richard,et al.  Drum extraction in single channel audio signals using multi-layer Non negative Matrix Factor Deconvolution , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3]  Daniel P. W. Ellis,et al.  MIR_EVAL: A Transparent Implementation of Common MIR Metrics , 2014, ISMIR.

[4]  Masataka Goto,et al.  Unsupervised music understanding based on nonparametric Bayesian models , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5]  Gerhard Widmer,et al.  A Review of Automatic Drum Transcription , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[6]  Florian Krebs,et al.  madmom: A New Python Audio and Music Signal Processing Library , 2016, ACM Multimedia.

[7]  Alexander Lerch,et al.  Automatic Drum Transcription Using the Student-Teacher Learning Paradigm with Unlabeled Music Data , 2017, ISMIR.

[8]  Jiri Matas,et al.  Forward-Backward Error: Automatic Detection of Tracking Failures , 2010, 2010 20th International Conference on Pattern Recognition.

[9]  Jouni Paulus,et al.  Drum transcription with non-negative spectrogram factorisation , 2005, 2005 13th European Signal Processing Conference.

[10]  Peter Knees,et al.  Drum transcription from polyphonic music with recurrent neural networks , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11]  Jason Hockman,et al.  Automatic Drum Transcription Using Bi-Directional Recurrent Neural Networks , 2016, ISMIR.

[12]  Mark D. Plumbley,et al.  Unsupervised analysis of polyphonic music by sparse coding , 2006, IEEE Transactions on Neural Networks.

[13]  Markus Schedl,et al.  Polyphonic piano note transcription with recurrent neural networks , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[14]  Florian Krebs,et al.  Evaluating the Online Capabilities of Onset Detection Methods , 2012, ISMIR.

[15]  Hendrik Purwins,et al.  Convolutional Neural Networks with Batch Normalization for Classifying Hi-hat, Snare, and Bass Percussion Sound Samples , 2016, Audio Mostly Conference.

[16]  Gaël Richard,et al.  ENST-Drums: an extensive audio-visual database for drum signals processing , 2006, ISMIR.

[17]  Jason Hockman,et al.  Automatic Drum Transcription for Polyphonic Recordings Using Soft Attention Mechanisms and Convolutional Neural Networks , 2017, ISMIR.

[18]  Jason Hockman,et al.  Player Vs Transcriber: A Game Approach To Data Manipulation For Automatic Drum Transcription , 2018, ISMIR.

[19]  Alexander Lerch,et al.  Drum transcription using partially fixed non-negative matrix factorization , 2015, 2015 23rd European Signal Processing Conference (EUSIPCO).

[20]  Anssi Klapuri,et al.  Drum Sound Detection in Polyphonic Music with Hidden Markov Models , 2009, EURASIP J. Audio Speech Music. Process..

[21]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[22]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[23]  Henry Lindsay-Smith DRUMKIT TRANSCRIPTION VIA CONVOLUTIVE NMF , 2012 .

[24]  Peter Knees,et al.  Recurrent Neural Networks for Drum Transcription , 2016, ISMIR.

[25]  Daniel Gärtner,et al.  Real-Time Transcription and Separation of Drum Recordings Based on NMF Decomposition , 2014, DAFx.

[26]  Ramón Fernández Astudillo,et al.  From Softmax to Sparsemax: A Sparse Model of Attention and Multi-Label Classification , 2016, ICML.

[27]  Peter Knees,et al.  Towards multi-instrument drum transcription , 2018, ArXiv.

[28]  Alexander Lerch,et al.  MDB Drums: An annotated subset of MedleyDB for automatic drum transcription , 2017 .

[29]  Jitendra Malik,et al.  Learning Category-Specific Mesh Reconstruction from Image Collections , 2018, ECCV.

[30]  Axel Röbel,et al.  On automatic drum transcription using non-negative matrix deconvolution and itakura saito divergence , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[31]  Simon Dixon,et al.  An End-to-End Neural Network for Polyphonic Piano Music Transcription , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[32]  Jordi Bonada,et al.  A Neural Parametric Singing Synthesizer , 2017, INTERSPEECH.

[33]  François Pachet,et al.  ON THE USE OF ZERO-CROSSING RATE FOR AN APPLICATION OF CLASSIFICATION OF PERCUSSIVE SOUNDS , 2000 .

[34]  Daniel P. W. Ellis,et al.  A Discriminative Model for Polyphonic Piano Transcription , 2007, EURASIP J. Adv. Signal Process..

[35]  Dan Klein,et al.  Unsupervised Transcription of Piano Music , 2014, NIPS.

[36]  E. Owens,et al.  An Introduction to the Psychology of Hearing , 1997 .

[37]  Sepp Hochreiter,et al.  Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[38]  Alexander Lerch,et al.  From Labeled to Unlabeled Data - On the Data Challenge in Automatic Drum Transcription , 2018, ISMIR.

[39]  Derry Fitzgerald,et al.  Harmonic/Percussive Separation Using Median Filtering , 2010 .

[40]  Juan Pablo Bello,et al.  Increasing drum transcription vocabulary using data synthesis , 2018 .

[41]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[42]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.