论文信息 - Multi-Task Learning of Tempo and Beat: Learning One to Improve the Other

Multi-Task Learning of Tempo and Beat: Learning One to Improve the Other

We propose a multi-task learning approach for simultaneous tempo estimation and beat tracking of musical audio. The system shows state-of-the-art performance for both tasks on a wide range of data, but has another fundamental advantage: due to its multi-task nature, it is not only able to exploit the mutual information of both tasks by learning a common, shared representation, but can also improve one by learning only from the other. The multi-task learning is achieved by globally aggregating the skip connections of a beat tracking system built around temporal convolutional networks, and feeding them into a tempo classification layer. The benefit of this approach is investigated by the inclusion of training data for which tempo-only annotations are available, and which is shown to provide improvements in beat tracking accuracy.

[1] Florian Krebs,et al. Accurate Tempo Estimation Based on Recurrent Neural Networks and Resonating Comb Filters , 2015, ISMIR.

[2] Matthew E. P. Davies,et al. Selective Sampling for Beat Tracking Evaluation , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[3] Orberto,et al. Evaluation Methods for Musical Audio Beat Tracking Algorithms , 2009 .

[4] Matthew E. P. Davies,et al. One in the Jungle: Downbeat Detection in Hardcore, Jungle, and Drum and Bass , 2012, ISMIR.

[5] Geoffroy Peeters,et al. Perceptual tempo estimation using GMM-regression , 2012, MIRUM '12.

[6] Markus Schedl,et al. ENHANCED BEAT TRACKING WITH CONTEXT-AWARE NEURAL NETWORKS , 2011 .

[7] Ichiro Fujinaga,et al. Fast vs Slow: Learning Tempo Octaves from User Data , 2010, ISMIR.

[8] Miguel A. Alonso,et al. Accurate tempo estimation based on harmonic + noise decomposition , 2007, EURASIP J. Adv. Signal Process..

[9] Vassilis Katsouros,et al. Reducing Tempo Octave Errors by Periodicity Vector Coding And SVM Learning , 2012, ISMIR.

[10] Peter Desain,et al. On tempo tracking: Tempogram Representation and Kalman filtering , 2000, ICMC.

[11] Sebastian Böck,et al. Temporal convolutional networks for musical audio beat tracking , 2019, 2019 27th European Signal Processing Conference (EUSIPCO).

[12] Meinard Müller,et al. A Single-Step Approach to Musical Tempo Estimation Using a Convolutional Neural Network , 2018, ISMIR.

[13] George Tzanetakis,et al. Streamlined Tempo Estimation Based on Autocorrelation and Cross-correlation With Pulses , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[14] Florian Krebs,et al. A Multi-model Approach to Beat Tracking Considering Heterogeneous Music Styles , 2014, ISMIR.

[15] Jyh-Shing Roger Jang,et al. A supervised learning method for tempo estimation of musical audio , 2014, 22nd Mediterranean Conference on Control and Automation.

[16] Simon Dixon. An Empirical Comparison of Tempo Trackers , 2001 .

[17] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[18] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[19] Florian Krebs,et al. Rhythmic Pattern Modeling for Beat and Downbeat Tracking in Musical Audio , 2013, ISMIR.

[20] Malcolm D. Macleod,et al. Particle Filtering Applied to Musical Tempo Tracking , 2004, EURASIP J. Adv. Signal Process..

[21] Yoichi Muraoka,et al. A beat tracking system for acoustic signals of music , 1994, MULTIMEDIA '94.

[22] Peter Knees,et al. Two Data Sets for Tempo Estimation and Key Detection in Electronic Dance Music Annotated from User Corrections , 2015, ISMIR.

[23] F. Gouyon. A computational approach to rhythm description - Audio features for the computation of rhythm periodicity functions and their use in tempo induction and music content processing , 2005 .

[24] George Tzanetakis,et al. An experimental comparison of audio tempo induction algorithms , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[25] Jonathan Tompson,et al. Efficient object localization using Convolutional Networks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26] Masataka Goto,et al. AIST Annotation for the RWC Music Database , 2006, ISMIR.

[27] Vladlen Koltun,et al. An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling , 2018, ArXiv.

[28] Simon Dixon,et al. Automatic Extraction of Tempo and Beat From Expressive Performances , 2001 .

[29] Meinard Müller,et al. A Crowdsourced Experiment for Tempo Estimation of Electronic Dance Music , 2018, ISMIR.

[30] Florian Krebs,et al. madmom: A New Python Audio and Music Signal Processing Library , 2016, ACM Multimedia.

[31] Rich Caruana,et al. Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[32] Simon Dixon,et al. A Review of Automatic Rhythm Description Systems , 2005, Computer Music Journal.

[33] George Tzanetakis,et al. Musical genre classification of audio signals , 2002, IEEE Trans. Speech Audio Process..

[34] Florian Krebs,et al. Joint Beat and Downbeat Tracking with Recurrent Neural Networks , 2016, ISMIR.

[35] Meinard Müller,et al. A Post-Processing Procedure for Improving Music Tempo Estimates Using Supervised Learning , 2017, ISMIR.

[36] Geoffroy Peeters,et al. Swing Ratio Estimation , 2015 .

[37] Fabien Gouyon,et al. Determination of the meter of musical audio signals: Seeking recurrences in beat segment descriptors , 2003 .

[38] Jaakko Astola,et al. Analysis of the meter of acoustic musical signals , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[39] Anders Elowsson. Beat Tracking with a Cepstroid Invariant Neural Network , 2016, ISMIR.

[40] Eric D. Scheirer,et al. Tempo and beat analysis of acoustic musical signals. , 1998, The Journal of the Acoustical Society of America.

[41] Vassilis Katsouros,et al. Music tempo estimation and beat tracking by applying source separation and metrical relations , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[42] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.

[43] Sepp Hochreiter,et al. Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[44] Florian Krebs,et al. An Efficient State-Space Model for Joint Tempo and Meter Tracking , 2015, ISMIR.