Improving Peak-picking Using Multiple Time-step Loss Functions

The majority of state-of-the-art methods for music information retrieval (MIR) tasks now utilise deep learning methods reliant on minimisation of loss functions such as cross entropy. For tasks that include framewise binary classification (e.g., onset detection, music transcription) classes are derived from output activation functions by identifying points of local maxima, or peaks. However, the operating principles behind peak picking are different to that of the cross entropy loss function, which minimises the absolute difference between the output and target values for a single frame. To generate activation functions more suited to peak-picking, we propose two versions of a new loss function that incorporates information from multiple time-steps: 1) multi-individual, which uses multiple individual time-step cross entropies; and 2) multi-difference, which directly compares the difference between sequential time-step outputs. We evaluate the newly proposed loss functions alongside standard cross entropy in the popular MIR tasks of onset detection and automatic drum transcription. The results highlight the effectiveness of these loss functions in the improvement of overall system accuracies for both MIR tasks. Additionally, directly comparing the output from sequential time-steps in the multidifference approach achieves the highest performance.

[1]  Gaël Richard,et al.  ENST-Drums: an extensive audio-visual database for drum signals processing , 2006, ISMIR.

[2]  Shie Mannor,et al.  A Tutorial on the Cross-Entropy Method , 2005, Ann. Oper. Res..

[3]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[4]  Florian Krebs,et al.  Evaluating the Online Capabilities of Onset Detection Methods , 2012, ISMIR.

[5]  Mark B. Sandler,et al.  A tutorial on onset detection in music signals , 2005, IEEE Transactions on Speech and Audio Processing.

[6]  Alexander Lerch,et al.  Drum transcription using partially fixed non-negative matrix factorization , 2015, 2015 23rd European Signal Processing Conference (EUSIPCO).

[7]  Alexander Lerch,et al.  MDB Drums: An annotated subset of MedleyDB for automatic drum transcription , 2017 .

[8]  Björn W. Schuller,et al.  Universal Onset Detection with Bidirectional Long Short-Term Memory Neural Networks , 2010, ISMIR.

[9]  Gaël Richard,et al.  Transcription and Separation of Drum Signals From Polyphonic Music , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  Florian Krebs,et al.  madmom: A New Python Audio and Music Signal Processing Library , 2016, ACM Multimedia.

[11]  Daniel Gärtner,et al.  Real-Time Transcription and Separation of Drum Recordings Based on NMF Decomposition , 2014, DAFx.

[12]  Peter Knees,et al.  Drum Transcription via Joint Beat and Drum Modeling Using Convolutional Recurrent Neural Networks , 2017, ISMIR.

[13]  I. Kauppinen,et al.  Methods for detecting impulsive noise in speech and audio signals , 2002, 2002 14th International Conference on Digital Signal Processing Proceedings. DSP 2002 (Cat. No.02TH8628).

[14]  Jason Hockman,et al.  Automatic Drum Transcription for Polyphonic Recordings Using Soft Attention Mechanisms and Convolutional Neural Networks , 2017, ISMIR.

[15]  Jason Hockman,et al.  Player Vs Transcriber: A Game Approach To Data Manipulation For Automatic Drum Transcription , 2018, ISMIR.

[16]  Jason Hockman,et al.  Automatic Drum Transcription Using Bi-Directional Recurrent Neural Networks , 2016, ISMIR.

[17]  Peter Knees,et al.  Recurrent Neural Networks for Drum Transcription , 2016, ISMIR.

[18]  Gerhard Widmer,et al.  A Review of Automatic Drum Transcription , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[19]  Gerhard Widmer Enhanced peak picking for onset detection with recurrent neural networks , 2013 .

[20]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[21]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[22]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[23]  Sebastian Böck,et al.  Improved musical onset detection with Convolutional Neural Networks , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[24]  Peter Knees,et al.  Drum transcription from polyphonic music with recurrent neural networks , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).