Convolving Gaussian Kernels for RNN-Based Beat Tracking

Because of an ability of modelling context information, Recurrent Neural Networks (RNNs) or bi-directional RNNs (BRNNs) have been used for beat tracking with good performance. However, there are two problems associated with RNN-based beat tracking. The first problem is the imbalanced data: usually only around 2% frames are labelled as ‘beat’. The second one is the disagreement on the precise positions of beats in human annotations or the delay of annotations caused by human tapping. In order to tackle these problems, we propose to convolve the original ground truth with a Gaussian kernel as the target output of the network for a more robust training. We conduct a comparison experiment using five different Gaussian kernels on five individual datasets. The results on the validation sets show that we can train a better or at least competitive model in a shorter time by using the convolved ground truth with a proper Gaussian kernel.

[1]  Florian Krebs,et al.  Joint Beat and Downbeat Tracking with Recurrent Neural Networks , 2016, ISMIR.

[2]  Slim Essid,et al.  Downbeat Detection with Conditional Random Fields and Deep Learned Features , 2016, ISMIR.

[3]  Jürgen Schmidhuber,et al.  Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.

[4]  T. O’Brien MUSICAL STRUCTURE SEGMENTATION WITH CONVOLUTIONAL NEURAL NETWORKS , 2016 .

[5]  Florian Krebs,et al.  Downbeat Tracking Using Beat Synchronous Features with Recurrent Neural Networks , 2016, ISMIR.

[6]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[7]  Andreas Dengel,et al.  Impact of Training LSTM-RNN with Fuzzy Ground Truth , 2018, ICPRAM.

[8]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[9]  Florian Krebs,et al.  Rhythmic Pattern Modeling for Beat and Downbeat Tracking in Musical Audio , 2013, ISMIR.

[10]  Florian Krebs,et al.  Accurate Tempo Estimation Based on Recurrent Neural Networks and Resonating Comb Filters , 2015, ISMIR.

[11]  Matthew E. P. Davies,et al.  Selective Sampling for Beat Tracking Evaluation , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[12]  Masataka Goto,et al.  RWC Music Database: Popular, Classical and Jazz Music Databases , 2002, ISMIR.

[13]  George Tzanetakis,et al.  An experimental comparison of audio tempo induction algorithms , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[14]  Markus Schedl,et al.  ENHANCED BEAT TRACKING WITH CONTEXT-AWARE NEURAL NETWORKS , 2011 .

[15]  Florian Krebs,et al.  A Multi-model Approach to Beat Tracking Considering Heterogeneous Music Styles , 2014, ISMIR.

[16]  Gaël Richard,et al.  Robust Downbeat Tracking Using an Ensemble of Convolutional Networks , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[17]  Florian Krebs,et al.  madmom: A New Python Audio and Music Signal Processing Library , 2016, ACM Multimedia.

[18]  Thomas Grill,et al.  Boundary Detection in Music Structure Analysis using Convolutional Neural Networks , 2014, ISMIR.

[19]  Yoichi Muraoka,et al.  A beat tracking system for acoustic signals of music , 1994, MULTIMEDIA '94.