Learning to Read and Follow Music in Complete Score Sheet Images

This paper addresses the task of score following in sheet music given as unprocessed images. While existing work either relies on OMR software to obtain a computer-readable score representation, or crucially relies on prepared sheet image excerpts, we propose the first system that directly performs score following in full-page, completely unprocessed sheet images. Based on incoming audio and a given image of the score, our system directly predicts the most likely position within the page that matches the audio, outperforming current state-of-the-art image-based score followers in terms of alignment precision. We also compare our method to an OMR-based approach and empirically show that it can be a viable alternative to such a system.

[1]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[2]  Nicola Orio,et al.  Score Following: State of the Art and New Developments , 2003, NIME.

[3]  Simon Dixon,et al.  An On-Line Time Warping Algorithm for Tracking Musical Performances , 2005, IJCAI.

[4]  Arshia Cont Realtime Audio to Score Alignment for Polyphonic Music Instruments, using Sparse Non-Negative Constraints and Hierarchical HMMS , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[5]  Gerhard Widmer,et al.  Automatic Page Turning for Musicians via Real-Time Machine Listening , 2008, ECAI.

[6]  Christopher Raphael,et al.  Music Plus One and Machine Learning , 2010, ICML.

[7]  Arshia Cont,et al.  A Coupled Duration-Focused Architecture for Real-Time Music-to-Score Alignment , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Youngmoo E. Kim,et al.  Orchestral Performance Companion: Using Real-Time Audio to Score Alignment , 2013, IEEE MultiMedia.

[9]  Surya Ganguli,et al.  Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , 2013, ICLR.

[10]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[11]  Gerhard Widmer,et al.  Artificial Intelligence in the Concertgebouw , 2015, IJCAI.

[12]  Meinard Müller,et al.  Fundamentals of Music Processing , 2015, Springer International Publishing.

[13]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[14]  Gerhard Widmer,et al.  Towards Score Following In Sheet Music Images , 2016, ISMIR.

[15]  Florian Krebs,et al.  madmom: A New Python Audio and Music Signal Processing Library , 2016, ACM Multimedia.

[16]  Seyed-Ahmad Ahmadi,et al.  V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[17]  Geoffrey E. Hinton,et al.  Layer Normalization , 2016, ArXiv.

[18]  Vincent Dumoulin,et al.  Deconvolution and Checkerboard Artifacts , 2016 .

[19]  Alan L. Yuille,et al.  Generation and Comprehension of Unambiguous Object Descriptions , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Sepp Hochreiter,et al.  Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[21]  Jason Yosinski,et al.  An Intriguing Failing of Convolutional Neural Networks and the CoordConv Solution , 2018, NeurIPS.

[22]  Aaron C. Courville,et al.  FiLM: Visual Reasoning with a General Conditioning Layer , 2017, AAAI.

[23]  Gerhard Widmer,et al.  Learning to Listen, Read, and Follow: Score Following as a Reinforcement Learning Game , 2018, ISMIR.

[24]  Gerhard Widmer,et al.  Towards Full-Pipeline Handwritten OMR with Musical Symbol Detection by U-Nets , 2018, ISMIR.

[25]  Gerhard Widmer,et al.  Learning Audio-Sheet Music Correspondences for Cross-Modal Retrieval and Piece Identification , 2018, Trans. Int. Soc. Music. Inf. Retr..

[26]  Gerhard Widmer,et al.  Score Following as a Multi-Modal Reinforcement Learning Problem , 2019, Trans. Int. Soc. Music. Inf. Retr..

[27]  Gerhard Widmer,et al.  Audio-Conditioned U-Net for Position Estimation in Full Sheet Images , 2019, ArXiv.

[28]  Yang Wang,et al.  Cross-Modal Self-Attention Network for Referring Image Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Jan Hajic,et al.  Understanding Optical Music Recognition , 2019, ACM Comput. Surv..