End-to-End Neural Optical Music Recognition of Monophonic Scores

Optical Music Recognition is a field of research that investigates how to computationally decode music notation from images. Despite the efforts made so far, there are hardly any complete solutions to the problem. In this work, we study the use of neural networks that work in an end-to-end manner. This is achieved by using a neural model that combines the capabilities of convolutional neural networks, which work on the input image, and recurrent neural networks, which deal with the sequential nature of the problem. Thanks to the use of the the so-called Connectionist Temporal Classification loss function, these models can be directly trained from input images accompanied by their corresponding transcripts into music symbol sequences. We also present the Printed Music Scores dataset, containing more than 80,000 monodic single-staff real scores in common western notation, that is used to train and evaluate the neural approach. In our experiments, it is demonstrated that this formulation can be carried out successfully. Additionally, we study several considerations about the codification of the output musical sequences, the convergence and scalability of the neural models, as well as the ability of this approach to locate symbols in the input score.

[1]  Léon Bottou,et al.  Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.

[2]  H. Bourlard,et al.  Links Between Markov Models and Multilayer Perceptrons , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Jakob Grue Simonsen,et al.  Towards a Standard Testbed for Optical Music Recognition: Definitions, Metrics, and Page Images , 2015 .

[4]  Marc Leman,et al.  Content-Based Music Information Retrieval: Current Directions and Future Challenges , 2008, Proceedings of the IEEE.

[5]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[6]  Nojun Kwak,et al.  Handwritten Music Symbol Classification Using Deep Convolutional Neural Networks , 2016, 2016 International Conference on Information Science and Security (ICISS).

[7]  Hermann Ney,et al.  Handwriting Recognition with Large Multidimensional Long Short-Term Memory Recurrent Neural Networks , 2016, 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[8]  Perry Roland,et al.  The Music Encoding Initiative ( MEI ) , 2002 .

[9]  Christopher Raphael,et al.  New Approaches to Optical Music Recognition , 2011, ISMIR.

[10]  M. Szwoch Guido: A Musical Score Recognition System , 2007 .

[11]  Alejandro Héctor Toselli,et al.  Sheet Music Statistical Layout Analysis , 2016, 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[12]  Alex Graves,et al.  Supervised Sequence Labelling with Recurrent Neural Networks , 2012, Studies in Computational Intelligence.

[13]  Ana M. Barbancho,et al.  Avoiding staff removal stage in optical music recognition: application to scores written in white mensural notation , 2015, Pattern Analysis and Applications.

[14]  Jaroslav Pokorný,et al.  Further Steps Towards a Standard Testbed for Optical Music Recognition , 2016, ISMIR.

[15]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[16]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[17]  Eric Nichols,et al.  Lyric Extraction and Recognition on Digital Images of Early Music Sources , 2009, ISMIR.

[18]  Gregory Burlet,et al.  Optical Measure Recognition in Common Music Notation , 2013, ISMIR.

[19]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[20]  Ichiro Fujinaga,et al.  A Comparative Study of Staff Removal Algorithms , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[22]  Klaus Keil,et al.  Applications of RISM data in digital libraries and digital musicology , 2017, International Journal on Digital Libraries.

[23]  Anselmo Cardoso de Paiva,et al.  A Deep Approach for Handwritten Musical Symbols Recognition , 2016, WebMedia.

[24]  Bertrand Coüasnon DMOS: a generic document recognition method, application to an automatic generator of musical scores, mathematical formulae and table structures recognition systems , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[25]  Marcus Liwicki,et al.  A novel approach to on-line handwriting recognition based on bidirectional long short-term memory networks , 2007 .

[26]  Carlos Guedes,et al.  Optical music recognition: state-of-the-art and open issues , 2012, International Journal of Multimedia Information Retrieval.

[27]  Chong Wang,et al.  Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin , 2015, ICML.

[28]  Gilson A. Giraldi,et al.  Music Score Binarization Based on Domain Knowledge , 2011, IbPRIA.

[29]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[30]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[31]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[32]  Jorge Calvo-Zaragoza,et al.  Early Handwritten Music Recognition with Hidden Markov Models , 2016, 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[33]  Thierry Géraud,et al.  A morphological method for music score staff removal , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[34]  Jorge Calvo-Zaragoza,et al.  End-to-End Optical Music Recognition Using Neural Networks , 2017, ISMIR.

[35]  Carlos Guedes,et al.  Staff Detection with Stable Paths , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Perry Roland,et al.  Verovio: A library for Engraving MEI Music Notation into SVG , 2014, ISMIR.

[37]  José Oncina,et al.  Staff-line detection and removal using a convolutional neural network , 2017, Machine Vision and Applications.

[38]  Kia Ng,et al.  Big Data Optical Music Recognition with Multi Images and Multi Recognisers , 2014, EVA.

[39]  Timothy C. Bell,et al.  The Challenge of Optical Music Recognition , 2001, Comput. Humanit..

[40]  Jürgen Schmidhuber,et al.  Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.

[41]  Alicia Fornés,et al.  CVC-MUSCIMA: a ground truth of handwritten music score images for writer identification and staff removal , 2012, International Journal on Document Analysis and Recognition (IJDAR).

[42]  Juan Ramón Rico-Juan,et al.  Recognition of Handwritten Music Symbols using Meta-features Obtained from Weak Classifiers based on Nearest Neighbor , 2017, ICPRAM.

[43]  Isabelle Bloch,et al.  Robust and Adaptive OMR System Including Fuzzy Modeling, Fusion of Musical Rules, and Possible Error Detection , 2007, EURASIP J. Adv. Signal Process..

[44]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[45]  Michael Good,et al.  Using MusicXML for file interchange , 2003, Proceedings Third International Conference on WEB Delivering of Music.

[46]  Laurent Pugin,et al.  Optical Music Recognition of Early Typographic Prints using Hidden Markov Models , 2006 .

[47]  Jorge Calvo-Zaragoza,et al.  Staff-line removal with selectional auto-encoders , 2017, Expert Syst. Appl..

[48]  Jorge Calvo-Zaragoza,et al.  Recognition of Handwritten Music Symbols with Convolutional Neural Codes , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[49]  Horst M. Eidenberger,et al.  Towards a Universal Music Symbol Classifier , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[50]  Andrea Klug Beyond Midi The Handbook Of Musical Codes , 2016 .

[51]  Isabel Barbancho,et al.  Optical Music Recognition for Scores Written in White Mensural Notation , 2009, EURASIP J. Image Video Process..

[52]  Jaime S. Cardoso,et al.  Optical recognition of music symbols , 2010, International Journal on Document Analysis and Recognition (IJDAR).