Multitask Learning for Polyphonic Piano Transcription, a Case Study

Viewing polyphonic piano transcription as a multitask learning problem, where we need to simultaneously predict onsets, intermediate frames and offsets of notes, we investigate the performance impact of additional prediction targets, using a variety of suitable convolutional neural network architectures. We quantify performance differences of additional objectives on the larGe MAESTRO dataset.

[1]  Yoshua Bengio,et al.  Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation , 2013, ArXiv.

[2]  Roland Badeau,et al.  Multipitch Estimation of Piano Sounds Using a New Probabilistic Spectral Smoothness Principle , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Mark Sandler,et al.  Transfer Learning for Music Classification and Regression Tasks , 2017, ISMIR.

[4]  Sebastian Ruder,et al.  An Overview of Multi-Task Learning in Deep Neural Networks , 2017, ArXiv.

[5]  Colin Raffel,et al.  Onsets and Frames: Dual-Objective Piano Transcription , 2017, ISMIR.

[6]  Douglas Eck,et al.  Enabling Factorized Piano Music Modeling and Generation with the MAESTRO Dataset , 2018, ICLR.

[7]  Qiang Yang,et al.  An Overview of Multi-task Learning , 2018 .

[8]  Leslie N. Smith,et al.  A disciplined approach to neural network hyper-parameters: Part 1 - learning rate, batch size, momentum, and weight decay , 2018, ArXiv.

[9]  Li Guo,et al.  A Parallel Fusion Approach to Piano Music Transcription Based on Convolutional Neural Network , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[10]  Jürgen Schmidhuber,et al.  Framewise phoneme classification with bidirectional LSTM and other neural network architectures , 2005, Neural Networks.

[11]  Alan Hanjalic,et al.  One deep music representation to rule them all? A comparative analysis of different representation learning strategies , 2018, Neural Computing and Applications.

[12]  Rich Caruana,et al.  Multitask Learning: A Knowledge-Based Source of Inductive Bias , 1993, ICML.

[13]  Y. Nesterov A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .

[14]  Martial Hebert,et al.  Cross-Stitch Networks for Multi-task Learning , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Sepp Hochreiter,et al.  Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[16]  Juan Pablo Bello,et al.  Multitask Learning for Fundamental Frequency Estimation in Music , 2018, ArXiv.

[17]  Rich Caruana,et al.  A Dozen Tricks with Multitask Learning , 1996, Neural Networks: Tricks of the Trade.

[18]  Gerhard Widmer,et al.  Deep Polyphonic ADSR Piano Note Transcription , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[19]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[20]  Li Su,et al.  Functional Harmony Recognition of Symbolic Music Data with Multi-task Recurrent Neural Networks , 2018, ISMIR.