Drum Transcription via Joint Beat and Drum Modeling Using Convolutional Recurrent Neural Networks

Existing systems for automatic transcription of drum tracks from polyphonic music focus on detecting drum instrument onsets but lack consideration of additional meta information like bar boundaries, tempo, and meter. We address this limitation by proposing a system which has the capability to detect drum instrument onsets along with the corresponding beats and downbeats. In this design, the system has the means to utilize information on the rhythmical structure of a song which is closely related to the desired drum transcript. To this end, we introduce and compare different architectures for this task, i.e., recurrent, convolutional, and recurrent-convolutional neural networks. We evaluate our systems on two well-known data sets and an additional new data set containing both drum and beat annotations. We show that convolutional and recurrentconvolutional neural networks perform better than state-ofthe-art methods and that learning beats jointly with drums can be beneficial for the task of drum detection.

[1]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[2]  Ronan Collobert,et al.  Recurrent Convolutional Neural Networks for Scene Labeling , 2014, ICML.

[3]  Anssi Klapuri,et al.  Automatic music transcription: challenges and future directions , 2013, Journal of Intelligent Information Systems.

[4]  Geoffroy Peeters,et al.  Simultaneous Beat and Downbeat-Tracking Using a Probabilistic Framework: Theory and Large-Scale Evaluation , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  Gaël Richard,et al.  Feature adapted convolutional neural networks for downbeat tracking , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6]  Gaël Richard,et al.  Transcription and Separation of Drum Signals From Polyphonic Music , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  Trevor Darrell,et al.  Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Sebastian Böck,et al.  Improved musical onset detection with Convolutional Neural Networks , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  Peter Knees,et al.  Drum transcription from polyphonic music with recurrent neural networks , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[10]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[11]  Florian Krebs,et al.  Rhythmic Pattern Modeling for Beat and Downbeat Tracking in Musical Audio , 2013, ISMIR.

[12]  Christian Uhle,et al.  EXTRACTION OF DRUM TRACKS FROM POLYPHONIC MUSIC USING INDEPENDENT SUBSPACE ANALYSIS , 2003 .

[13]  Derry Fitzgerald,et al.  Drum Transcription in the presence of pitched instruments using Prior Subspace Analysis , 2003 .

[14]  Jason Hockman,et al.  Automatic Drum Transcription Using Bi-Directional Recurrent Neural Networks , 2016, ISMIR.

[15]  Tillman Weyde,et al.  An RNN-based Music Language Model for Improving Automatic Music Transcription , 2014, ISMIR.

[16]  Florian Krebs,et al.  Joint Beat and Downbeat Tracking with Recurrent Neural Networks , 2016, ISMIR.

[17]  Gang Wang,et al.  Convolutional recurrent neural networks: Learning spatial dependencies for image representation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[18]  Masataka Goto,et al.  Drum Sound Recognition for Polyphonic Audio Signals by Adaptation and Matching of Spectrogram Templates With Harmonic Structure Suppression , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[19]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[20]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[21]  Daniel Gärtner,et al.  Real-Time Transcription and Separation of Drum Recordings Based on NMF Decomposition , 2014, DAFx.

[22]  Gaël Richard,et al.  Supervised and Unsupervised Sequence Modelling for Drum Transcription , 2007, ISMIR.

[23]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[24]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[25]  Matthew E. P. Davies,et al.  An open-source drum transcription system for Pure Data and Max MSP , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[26]  G. Widmer,et al.  MAXIMUM FILTER VIBRATO SUPPRESSION FOR ONSET DETECTION , 2013 .

[27]  Florian Krebs,et al.  Unsupervised learning and refinement of rhythmic patterns for beat and downbeat tracking , 2014, 2014 22nd European Signal Processing Conference (EUSIPCO).

[28]  Gaël Richard,et al.  Automatic transcription of drum loops , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[29]  Gaël Richard,et al.  Downbeat tracking with multiple features and deep neural networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[30]  Hendrik Purwins,et al.  Convolutional Neural Networks with Batch Normalization for Classifying Hi-hat, Snare, and Bass Percussion Sound Samples , 2016, Audio Mostly Conference.

[31]  Alexander Lerch,et al.  Drum transcription using partially fixed non-negative matrix factorization , 2015, 2015 23rd European Signal Processing Conference (EUSIPCO).

[32]  Florian Krebs,et al.  Downbeat Tracking Using Beat Synchronous Features with Recurrent Neural Networks , 2016, ISMIR.

[33]  Gaël Richard,et al.  ENST-Drums: an extensive audio-visual database for drum signals processing , 2006, ISMIR.

[34]  Anssi Klapuri,et al.  Drum Sound Detection in Polyphonic Music with Hidden Markov Models , 2009, EURASIP J. Audio Speech Music. Process..

[35]  Peter Knees,et al.  Recurrent Neural Networks for Drum Transcription , 2016, ISMIR.

[36]  Arthur Flexer,et al.  Drum Transcription in Polyphonic Music Using Non-Negative Matrix Factorisation , 2007, ISMIR.