Further Steps Towards a Standard Testbed for Optical Music Recognition

Evaluating Optical Music Recognition (OMR) is notoriously difficult and automated end-to-end OMR evaluation metrics are not available to guide development. In “Towards a Standard Testbed for Optical Music Recognition: Definitions, Metrics, and Page Images”, Byrd and Simonsen recently stress that a benchmarking standard is needed in the OMR community, both with regards to data and evaluation metrics. We build on their analysis and definitions and present a prototype of an OMR benchmark. We do not, however, presume to present a complete solution to the complex problem of OMR benchmarking. Our contributions are: (a) an attempt to define a multilevel OMR benchmark dataset and a practical prototype implementation for both printed and handwritten scores, (b) a corpus-based methodology for assessing automated evaluation metrics, and an underlying corpus of over 1000 qualified relative cost-to-correct judgments. We then assess several straightforward automated MusicXML evaluation metrics against this corpus to establish a baseline over which further metrics can improve.

[1]  Jakob Grue Simonsen,et al.  Towards a Standard Testbed for Optical Music Recognition: Definitions, Metrics, and Page Images , 2015 .

[2]  Carlos Guedes,et al.  Optical music recognition: state-of-the-art and open issues , 2012, International Journal of Multimedia Information Retrieval.

[3]  Alicia Fornés,et al.  Writer Identification in Old Handwritten Music Scores , 2008, 2008 The Eighth IAPR International Workshop on Document Analysis Systems.

[4]  Ichiro Fujinaga,et al.  Micro-level groundtruthing environment for OMR , 2004, ISMIR.

[5]  Philipp Koehn,et al.  (Meta-) Evaluation of Machine Translation , 2007, WMT@ACL.

[6]  Xiang Bai,et al.  An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Ondrej Bojar,et al.  A Grain of Salt for the WMT Manual Evaluation , 2011, WMT@EMNLP.

[8]  Ondrej Bojar,et al.  Evaluating Machine Translation Quality Using Short Segments Annotations , 2015, Prague Bull. Math. Linguistics.

[9]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[10]  Ichiro Fujinaga,et al.  A Comparative Study of Staff Removal Algorithms , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Alicia Fornés,et al.  CVC-MUSCIMA: a ground truth of handwritten music score images for writer identification and staff removal , 2012, International Journal on Document Analysis and Recognition (IJDAR).

[12]  Ondrej Bojar,et al.  Results of the WMT14 Metrics Shared Task , 2013 .

[13]  Philipp Koehn,et al.  Findings of the 2010 Joint Workshop on Statistical Machine Translation and Metrics for Machine Translation , 2010, WMT@ACL.

[14]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with High Levels of Correlation with Human Judgments , 2007, WMT@ACL.

[15]  Ivan Bruno,et al.  Optical Music Imaging: Music Document Digitisation, Recognition, Evaluation, and Restoration , 2008 .

[16]  Pierfrancesco Bellini,et al.  Assessing Optical Music Recognition Tools , 2007, Computer Music Journal.

[17]  Kia Ng,et al.  Improving OMR for Digital Music Libraries with Multiple Recognisers and Multiple Sources , 2014, DLfM '14.

[18]  Donald Byrd,et al.  Towards Musicdiff: A Foundation for Improved Optical Music Recognition Using Multiple Recognizers , 2007, ISMIR.

[19]  José Oncina,et al.  Recognition of Pen-Based Music Notation: The HOMUS Dataset , 2014, 2014 22nd International Conference on Pattern Recognition.

[20]  Mariusz Szwoch,et al.  Using MusicXML to Evaluate Accuracy of OMR Systems , 2008, Diagrams.