WaveBeat: End-to-end beat and downbeat tracking in the time domain

Deep learning approaches for beat and downbeat tracking have brought advancements. However, these approaches continue to rely on hand-crafted, subsampled spectral features as input, restricting the information available to the model. In this work, we propose WaveBeat, an end-to-end approach for joint beat and downbeat tracking operating directly on waveforms. This method forgoes engineered spectral features, and instead, produces beat and downbeat predictions directly from the waveform, the first of its kind for this task. Our model utilizes temporal convolutional networks (TCNs) operating on waveforms that achieve a very large receptive field (≥ 30 s) at audio sample rates in a memory efficient manner by employing rapidly growing dilation factors with fewer layers. With a straightforward data augmentation strategy, our method outperforms previous state-of-the-art methods on some datasets, while producing comparable results on others, demonstrating the potential for time domain approaches.

[1]  Malcolm D. Macleod,et al.  Particle Filtering Applied to Musical Tempo Tracking , 2004, EURASIP J. Adv. Signal Process..

[2]  Matthew E. P. Davies,et al.  Multi-Task Learning of Tempo and Beat: Learning One to Improve the Other , 2019, ISMIR.

[3]  Florian Krebs,et al.  Joint Beat and Downbeat Tracking with Recurrent Neural Networks , 2016, ISMIR.

[4]  Douglas Eck Beat Tracking using an Autocorrelation Phase Matrix , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[5]  Matthew E. P. Davies,et al.  Selective Sampling for Beat Tracking Evaluation , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  Geoffroy Peeters,et al.  GTZAN-Rhythm: Extending the GTZAN Test-Set with Beat, Downbeat and Swing Annotations , 2015 .

[7]  Gaël Richard,et al.  Downbeat tracking with multiple features and deep neural networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8]  Florian Krebs,et al.  Rhythmic Pattern Modeling for Beat and Downbeat Tracking in Musical Audio , 2013, ISMIR.

[9]  Matthew E. P. Davies,et al.  One in the Jungle: Downbeat Detection in Hardcore, Jungle, and Drum and Bass , 2012, ISMIR.

[10]  Masataka Goto,et al.  An Audio-based Real-time Beat Tracking System for Music With or Without Drum-sounds , 2001 .

[11]  Sebastian Böck,et al.  Temporal convolutional networks for musical audio beat tracking , 2019, 2019 27th European Signal Processing Conference (EUSIPCO).

[12]  Simon Dixon,et al.  Automatic Extraction of Tempo and Beat From Expressive Performances , 2001 .

[13]  Vladlen Koltun,et al.  Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[14]  George Tzanetakis,et al.  Musical genre classification of audio signals , 2002, IEEE Trans. Speech Audio Process..

[15]  Florian Krebs,et al.  madmom: A New Python Audio and Music Signal Processing Library , 2016, ACM Multimedia.

[16]  Florian Krebs,et al.  A Multi-model Approach to Beat Tracking Considering Heterogeneous Music Styles , 2014, ISMIR.

[17]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[18]  Matthew E. P. Davies,et al.  Context-Dependent Beat Tracking of Musical Audio , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[19]  Slim Essid,et al.  A Music Structure Informed Downbeat Tracking System Using Skip-chain Conditional Random Fields and Deep Learning , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20]  Simon Dixon,et al.  Evaluation of the Audio Beat Tracking System BeatRoot , 2007 .

[21]  Vamsi Krishna Ithapu,et al.  Filtered Noise Shaping for Time Domain Room Impulse Response Estimation from Reverberant Speech , 2021, 2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[22]  George Tzanetakis,et al.  An experimental comparison of audio tempo induction algorithms , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[23]  Orberto,et al.  Evaluation Methods for Musical Audio Beat Tracking Algorithms , 2009 .

[24]  Longbing Cao,et al.  Training deep neural networks on imbalanced data sets , 2016, 2016 International Joint Conference on Neural Networks (IJCNN).

[25]  M. Davies,et al.  Deconstruct, Analyse, Reconstruct: How to improve Tempo, Beat, and Downbeat Estimation , 2020, ISMIR.

[26]  Downbeat Tracking with Tempo-Invariant Convolutional Neural Networks , 2021, ISMIR.

[27]  Xavier Serra,et al.  End-to-end Learning for Music Audio Tagging at Scale , 2017, ISMIR.

[28]  Tijl De Bie,et al.  From raw audio to a seamless mix: creating an automated DJ system for Drum and Bass , 2018, EURASIP J. Audio Speech Music. Process..

[29]  Daniel P. W. Ellis,et al.  Beat Tracking by Dynamic Programming , 2007 .

[30]  Jean Laroche,et al.  Efficient Tempo and Beat Tracking in Audio Recordings , 2003 .

[31]  Masataka Goto,et al.  RWC Music Database: Popular, Classical and Jazz Music Databases , 2002, ISMIR.

[32]  Augusto Sarti,et al.  Automatic chord recognition based on the probabilistic modeling of diatonic modal harmony , 2013 .

[33]  F. Gouyon A computational approach to rhythm description - Audio features for the computation of rhythm periodicity functions and their use in tempo induction and music content processing , 2005 .

[34]  Xavier Serra,et al.  A Wavenet for Speech Denoising , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[35]  Joshua D. Reiss,et al.  Efficient Neural Networks for Real-time Analog Audio Effect Modeling , 2021, ArXiv.

[36]  Markus Schedl,et al.  ENHANCED BEAT TRACKING WITH CONTEXT-AWARE NEURAL NETWORKS , 2011 .

[37]  Heiga Zen,et al.  WaveNet: A Generative Model for Raw Audio , 2016, SSW.

[38]  Vladlen Koltun,et al.  An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling , 2018, ArXiv.