User-Driven Fine-Tuning for Beat Tracking

The extraction of the beat from musical audio signals represents a foundational task in the field of music information retrieval. While great advances in performance have been achieved due the use of deep neural networks, significant shortcomings still remain. In particular, performance is generally much lower on musical content that differs from that which is contained in existing annotated datasets used for neural network training, as well as in the presence of challenging musical conditions such as rubato. In this paper, we positioned our approach to beat tracking from a real-world perspective where an end-user targets very high accuracy on specific music pieces and for which the current state of the art is not effective. To this end, we explored the use of targeted fine-tuning of a state-of-the-art deep neural network based on a very limited temporal region of annotated beat locations. We demonstrated the success of our approach via improved performance across existing annotated datasets and a new annotation-correction approach for evaluation. Furthermore, we highlighted the ability of content-specific fine-tuning to learn both what is and what is not the beat in challenging musical conditions.

[1]  M. Davies,et al.  Tapping Along to the Difficult Ones: Leveraging User-Input for Beat Tracking in Highly Expressive Musical Content , 2019, CMMR.

[2]  Bryan Pardo,et al.  Bespoke Neural Networks for Score-Informed Source Separation , 2020, ArXiv.

[3]  Sebastian Böck,et al.  Temporal convolutional networks for musical audio beat tracking , 2019, 2019 27th European Signal Processing Conference (EUSIPCO).

[4]  Mark D. Plumbley,et al.  Performance Following: Real-Time Prediction of Musical Sequences Without a Score , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  Stefan Wermter,et al.  Continual Lifelong Learning with Neural Networks: A Review , 2019, Neural Networks.

[6]  F. Gouyon A computational approach to rhythm description - Audio features for the computation of rhythm periodicity functions and their use in tempo induction and music content processing , 2005 .

[7]  Stefano Soatto,et al.  A Baseline for Few-Shot Image Classification , 2019, ICLR.

[8]  Yu Wang,et al.  Few-Shot Drum Transcription in Polyphonic Music , 2020, ArXiv.

[9]  Mark B. Sandler,et al.  Sonic visualiser: an open source application for viewing, analysing, and annotating music audio files , 2010, ACM Multimedia.

[10]  George Tzanetakis,et al.  Musical genre classification of audio signals , 2002, IEEE Trans. Speech Audio Process..

[11]  Xavier Serra,et al.  Training Neural Audio Classifiers with Few Data , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12]  Yoichi Muraoka,et al.  A beat tracking system for acoustic signals of music , 1994, MULTIMEDIA '94.

[13]  Markus Schedl,et al.  ENHANCED BEAT TRACKING WITH CONTEXT-AWARE NEURAL NETWORKS , 2011 .

[14]  D. Ellis Beat Tracking by Dynamic Programming , 2007 .

[15]  Matthew E. P. Davies,et al.  Multi-Task Learning of Tempo and Beat: Learning One to Improve the Other , 2019, ISMIR.

[16]  Jose J. Valero-Mas,et al.  Interactive user correction of automatically detected onsets: approach and evaluation , 2017, EURASIP J. Audio Speech Music. Process..

[17]  Sebastian Böck,et al.  Improved musical onset detection with Convolutional Neural Networks , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[18]  A. S'a Pinto,et al.  Shift If You Can: Counting and Visualising Correction Operations for Beat Tracking Evaluation , 2020, ArXiv.

[19]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[20]  Tijl De Bie,et al.  From raw audio to a seamless mix: creating an automated DJ system for Drum and Bass , 2018, EURASIP J. Audio Speech Music. Process..

[21]  Ali Taylan Cemgil,et al.  Monte Carlo Methods for Tempo Tracking and Rhythm Quantization , 2011, J. Artif. Intell. Res..

[22]  Michael McCloskey,et al.  Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .

[23]  Rogério Schmidt Feris,et al.  SpotTune: Transfer Learning Through Adaptive Fine-Tuning , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Meinard Müller,et al.  A Single-Step Approach to Musical Tempo Estimation Using a Convolutional Neural Network , 2018, ISMIR.

[25]  Matthew E. P. Davies,et al.  Context-Dependent Beat Tracking of Musical Audio , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[26]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[27]  Simon Dixon An Interactive Beat Tracking and Visualisation System , 2001, ICMC.

[28]  Meinard Müller,et al.  Towards Automatically Correcting Tapped Beat Annotations for Music Recordings , 2019, ISMIR.

[29]  Jaakko Astola,et al.  Analysis of the meter of acoustic musical signals , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[30]  Augusto Sarti,et al.  Beat Tracking using Recurrent Neural Network: A Transfer Learning Approach , 2018, 2018 26th European Signal Processing Conference (EUSIPCO).

[31]  Matthew E. P. Davies,et al.  AutoMashUpper: Automatic Creation of Multi-Song Music Mashups , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[32]  James T. Kwok,et al.  Generalizing from a Few Examples , 2019, ACM Comput. Surv..

[33]  Sebastian Ruder,et al.  Universal Language Model Fine-tuning for Text Classification , 2018, ACL.

[34]  Matthew E. P. Davies,et al.  AUTOMATED RHYTHMIC TRANSFORMATION OF MUSICAL AUDIO , 2008 .

[35]  Matthew E. P. Davies,et al.  Selective Sampling for Beat Tracking Evaluation , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[36]  Florian Krebs,et al.  Rhythmic Pattern Modeling for Beat and Downbeat Tracking in Musical Audio , 2013, ISMIR.

[37]  Mark B. Sandler,et al.  On the use of phase and energy for musical onset detection in the complex domain , 2004, IEEE Signal Processing Letters.

[38]  Meinard Müller Tempo and Beat Tracking , 2015 .

[39]  Florian Krebs,et al.  A Multi-model Approach to Beat Tracking Considering Heterogeneous Music Styles , 2014, ISMIR.

[40]  Stephen Hainsworth Beat Tracking and Musical Metre Analysis , 2006 .

[41]  Geoffroy Peeters,et al.  Swing Ratio Estimation , 2015 .

[42]  Arthur Flexer,et al.  Introduction to the Special Collection "20th Anniversary of ISMIR" , 2020, Trans. Int. Soc. Music. Inf. Retr..

[43]  Daniel P. W. Ellis,et al.  MIR_EVAL: A Transparent Implementation of Common MIR Metrics , 2014, ISMIR.

[44]  Bin Ma,et al.  The similarity metric , 2001, IEEE Transactions on Information Theory.

[45]  A. Faisal,et al.  Noise in the nervous system , 2008, Nature Reviews Neuroscience.

[46]  G. Peeters The Deep Learning Revolution in MIR: The Pros and Cons, the Needs and the Challenges , 2019, CMMR.

[47]  George Tzanetakis,et al.  An experimental comparison of audio tempo induction algorithms , 2006, IEEE Transactions on Audio, Speech, and Language Processing.