Multi-Task Learning of Tempo and Beat: Learning One to Improve the Other

We propose a multi-task learning approach for simultaneous tempo estimation and beat tracking of musical audio. The system shows state-of-the-art performance for both tasks on a wide range of data, but has another fundamental advantage: due to its multi-task nature, it is not only able to exploit the mutual information of both tasks by learning a common, shared representation, but can also improve one by learning only from the other. The multi-task learning is achieved by globally aggregating the skip connections of a beat tracking system built around temporal convolutional networks, and feeding them into a tempo classification layer. The benefit of this approach is investigated by the inclusion of training data for which tempo-only annotations are available, and which is shown to provide improvements in beat tracking accuracy.

[1]  Florian Krebs,et al.  Accurate Tempo Estimation Based on Recurrent Neural Networks and Resonating Comb Filters , 2015, ISMIR.

[2]  Matthew E. P. Davies,et al.  Selective Sampling for Beat Tracking Evaluation , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Orberto,et al.  Evaluation Methods for Musical Audio Beat Tracking Algorithms , 2009 .

[4]  Matthew E. P. Davies,et al.  One in the Jungle: Downbeat Detection in Hardcore, Jungle, and Drum and Bass , 2012, ISMIR.

[5]  Geoffroy Peeters,et al.  Perceptual tempo estimation using GMM-regression , 2012, MIRUM '12.

[6]  Markus Schedl,et al.  ENHANCED BEAT TRACKING WITH CONTEXT-AWARE NEURAL NETWORKS , 2011 .

[7]  Ichiro Fujinaga,et al.  Fast vs Slow: Learning Tempo Octaves from User Data , 2010, ISMIR.

[8]  Miguel A. Alonso,et al.  Accurate tempo estimation based on harmonic + noise decomposition , 2007, EURASIP J. Adv. Signal Process..

[9]  Vassilis Katsouros,et al.  Reducing Tempo Octave Errors by Periodicity Vector Coding And SVM Learning , 2012, ISMIR.

[10]  Peter Desain,et al.  On tempo tracking: Tempogram Representation and Kalman filtering , 2000, ICMC.

[11]  Sebastian Böck,et al.  Temporal convolutional networks for musical audio beat tracking , 2019, 2019 27th European Signal Processing Conference (EUSIPCO).

[12]  Meinard Müller,et al.  A Single-Step Approach to Musical Tempo Estimation Using a Convolutional Neural Network , 2018, ISMIR.

[13]  George Tzanetakis,et al.  Streamlined Tempo Estimation Based on Autocorrelation and Cross-correlation With Pulses , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[14]  Florian Krebs,et al.  A Multi-model Approach to Beat Tracking Considering Heterogeneous Music Styles , 2014, ISMIR.

[15]  Jyh-Shing Roger Jang,et al.  A supervised learning method for tempo estimation of musical audio , 2014, 22nd Mediterranean Conference on Control and Automation.

[16]  Simon Dixon An Empirical Comparison of Tempo Trackers , 2001 .

[17]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[18]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[19]  Florian Krebs,et al.  Rhythmic Pattern Modeling for Beat and Downbeat Tracking in Musical Audio , 2013, ISMIR.

[20]  Malcolm D. Macleod,et al.  Particle Filtering Applied to Musical Tempo Tracking , 2004, EURASIP J. Adv. Signal Process..

[21]  Yoichi Muraoka,et al.  A beat tracking system for acoustic signals of music , 1994, MULTIMEDIA '94.

[22]  Peter Knees,et al.  Two Data Sets for Tempo Estimation and Key Detection in Electronic Dance Music Annotated from User Corrections , 2015, ISMIR.

[23]  F. Gouyon A computational approach to rhythm description - Audio features for the computation of rhythm periodicity functions and their use in tempo induction and music content processing , 2005 .

[24]  George Tzanetakis,et al.  An experimental comparison of audio tempo induction algorithms , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[25]  Jonathan Tompson,et al.  Efficient object localization using Convolutional Networks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Masataka Goto,et al.  AIST Annotation for the RWC Music Database , 2006, ISMIR.

[27]  Vladlen Koltun,et al.  An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling , 2018, ArXiv.

[28]  Simon Dixon,et al.  Automatic Extraction of Tempo and Beat From Expressive Performances , 2001 .

[29]  Meinard Müller,et al.  A Crowdsourced Experiment for Tempo Estimation of Electronic Dance Music , 2018, ISMIR.

[30]  Florian Krebs,et al.  madmom: A New Python Audio and Music Signal Processing Library , 2016, ACM Multimedia.

[31]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[32]  Simon Dixon,et al.  A Review of Automatic Rhythm Description Systems , 2005, Computer Music Journal.

[33]  George Tzanetakis,et al.  Musical genre classification of audio signals , 2002, IEEE Trans. Speech Audio Process..

[34]  Florian Krebs,et al.  Joint Beat and Downbeat Tracking with Recurrent Neural Networks , 2016, ISMIR.

[35]  Meinard Müller,et al.  A Post-Processing Procedure for Improving Music Tempo Estimates Using Supervised Learning , 2017, ISMIR.

[36]  Geoffroy Peeters,et al.  Swing Ratio Estimation , 2015 .

[37]  Fabien Gouyon,et al.  Determination of the meter of musical audio signals: Seeking recurrences in beat segment descriptors , 2003 .

[38]  Jaakko Astola,et al.  Analysis of the meter of acoustic musical signals , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[39]  Anders Elowsson Beat Tracking with a Cepstroid Invariant Neural Network , 2016, ISMIR.

[40]  Eric D. Scheirer,et al.  Tempo and beat analysis of acoustic musical signals. , 1998, The Journal of the Acoustical Society of America.

[41]  Vassilis Katsouros,et al.  Music tempo estimation and beat tracking by applying source separation and metrical relations , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[42]  Heiga Zen,et al.  WaveNet: A Generative Model for Raw Audio , 2016, SSW.

[43]  Sepp Hochreiter,et al.  Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[44]  Florian Krebs,et al.  An Efficient State-Space Model for Joint Tempo and Meter Tracking , 2015, ISMIR.