End-to-End Optical Music Recognition Using Neural Networks

This work addresses the Optical Music Recognition (OMR) task in an end-to-end fashion using neural networks. The proposed architecture is based on a Recurrent Convolutional Neural Network topology that takes as input an image of a monophonic score and retrieves a sequence of music symbols as output. In the first stage, a series of convolutional filters are trained to extract meaningful features of the input image, and then a recurrent block models the sequential nature of music. The system is trained using a Connectionist Temporal Classification loss function, which avoids the need for a frame-by-frame alignment between the image and the ground-truth music symbols. Experimentation has been carried on a set of 90,000 synthetic monophonic music scores with more than 50 different possible labels. Results obtained depict classification error rates around 2 % at symbol level, thus proving the potential of the proposed end-to-end architecture for OMR. The source code, dataset, and trained models are publicly released for reproducible research and future comparison purposes.

[1]  Alejandro Héctor Toselli,et al.  Multimodal interactive transcription of text images , 2010, Pattern Recognit..

[2]  Laurent Pugin,et al.  Optical Music Recognitoin of Early Typographic Prints using Hidden Markov Models , 2006, ISMIR.

[3]  Gilson A. Giraldi,et al.  Music Score Binarization Based on Domain Knowledge , 2011, IbPRIA.

[4]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[5]  Timothy C. Bell,et al.  The Challenge of Optical Music Recognition , 2001, Comput. Humanit..

[6]  Jaime S. Cardoso,et al.  Optical recognition of music symbols - A comparative study , 2010, Int. J. Document Anal. Recognit..

[7]  Alejandro Héctor Toselli,et al.  Sheet Music Statistical Layout Analysis , 2016, 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[8]  Eric Nichols,et al.  Lyric Extraction and Recognition on Digital Images of Early Music Sources , 2009, ISMIR.

[9]  Gregory Burlet,et al.  Optical Measure Recognition in Common Music Notation , 2013, ISMIR.

[10]  Simon Dixon,et al.  An End-to-End Neural Network for Polyphonic Piano Music Transcription , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[11]  Jürgen Schmidhuber,et al.  Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.

[12]  Jürgen Schmidhuber,et al.  Learning Precise Timing with LSTM Recurrent Networks , 2003, J. Mach. Learn. Res..

[13]  Thierry Géraud,et al.  A morphological method for music score staff removal , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[14]  José Oncina,et al.  An efficient approach for Interactive Sequential Pattern Recognition , 2017, Pattern Recognit..

[15]  Hermann Ney,et al.  Handwriting Recognition with Large Multidimensional Long Short-Term Memory Recurrent Neural Networks , 2016, 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[16]  Carlos Guedes,et al.  Optical music recognition: state-of-the-art and open issues , 2012, International Journal of Multimedia Information Retrieval.

[17]  Xiang Bai,et al.  An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[19]  Ichiro Fujinaga,et al.  A Comparative Study of Staff Removal Algorithms , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Jorge Calvo-Zaragoza,et al.  Early Handwritten Music Recognition with Hidden Markov Models , 2016, 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[21]  T. Munich,et al.  Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks , 2008, NIPS.

[22]  Jun Ohya,et al.  Automatic Recognition of Square Notation Symbols in Western Plainchant Manuscripts , 2014 .