论文信息 - Towards end-to-end polyphonic music transcription: Transforming music audio directly to a score

Towards end-to-end polyphonic music transcription: Transforming music audio directly to a score

We present a neural network model that learns to produce music scores directly from audio signals. Instead of employing commonplace processing steps, such as frequency transform front-ends, harmonicity and scale priors, or temporal pitch smoothing, we show that a neural network can learn such steps on its own when presented with the appropriate training data. We show how such a network can perform monophonic transcription with very high accuracy, and how it also generalizes well to transcribing polyphonic music.

Paris Smaragdis | Ralf Gunter Correa Carvalho | P. Smaragdis

[1] Lale Akarun,et al. Large scale polyphonic music transcription using randomized matrix decompositions , 2012, 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO).

[2] Anssi Klapuri,et al. Signal Processing Methods for Music Transcription , 2006 .

[3] Roland Badeau,et al. ON AUDIO , SPEECH , AND LANGUAGE PROCESSING 1 Harmonic Adaptive Latent Component Analysis of Audio and Application to Music Transcription , 2013 .

[4] Markus Schedl,et al. Polyphonic piano note transcription with recurrent neural networks , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5] Jürgen Schmidhuber,et al. Learning to forget: continual prediction with LSTM , 1999 .

[6] Simon Dixon,et al. A Shift-Invariant Latent Variable Model for Automatic Music Transcription , 2012, Computer Music Journal.

[7] Daniel P. W. Ellis,et al. Transcribing Multi-Instrument Polyphonic Music With Hierarchical Eigeninstruments , 2011, IEEE Journal of Selected Topics in Signal Processing.

[8] Jean Ponce,et al. A Theoretical Analysis of Feature Pooling in Visual Recognition , 2010, ICML.

[9] Yoshua Bengio,et al. Convolutional networks for images, speech, and time series , 1998 .

[10] James Anderson Moorer,et al. On the segmentation and analysis of continuous musical sound by digital computer , 1975 .

[11] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[12] Geoffrey E. Hinton,et al. Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[13] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[14] Mark B. Sandler,et al. Automatic Piano Transcription Using Frequency and Time-Domain Information , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[15] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[16] Brendt Wohlberg,et al. Piano music transcription with fast convolutional sparse coding , 2015, 2015 IEEE 25th International Workshop on Machine Learning for Signal Processing (MLSP).

[17] Daniel P. W. Ellis,et al. Melody Extraction from Polyphonic Music Signals: Approaches, applications, and challenges , 2014, IEEE Signal Processing Magazine.

[18] Han-Wen Nienhuys,et al. LILYPOND, A SYSTEM FOR AUTOMATED MUSIC ENGRAVING , 2003 .