Towards Score Following In Sheet Music Images

This paper addresses the matching of short music audio snippets to the corresponding pixel location in images of sheet music. A system is presented that simultaneously learns to read notes, listens to music and matches the currently played music to its corresponding notes in the sheet. It consists of an end-to-end multi-modal convolutional neural network that takes as input images of sheet music and spectrograms of the respective audio snippets. It learns to predict, for a given unseen audio snippet (covering approximately one bar of music), the corresponding position in the respective score line. Our results suggest that with the use of (deep) neural networks -- which have proven to be powerful image processing models -- working with sheet music becomes feasible and a promising future research direction.

[1]  Thomas Brox,et al.  Striving for Simplicity: The All Convolutional Net , 2014, ICLR.

[2]  Youngmoo E. Kim,et al.  Orchestral Performance Companion: Using Real-Time Audio to Score Alignment , 2013, IEEE MultiMedia.

[3]  Jordi Janer,et al.  Audio-to-score Alignment at the Note Level for Orchestral Recordings , 2014, ISMIR.

[4]  Arshia Cont,et al.  A Coupled Duration-Focused Architecture for Real-Time Music-to-Score Alignment , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[6]  Colin Raffel,et al.  Lasagne: First release. , 2015 .

[7]  Yoshua Bengio,et al.  Modeling Temporal Dependencies in High-Dimensional Sequences: Application to Polyphonic Music Generation and Transcription , 2012, ICML.

[8]  Mark S. Melenhorst,et al.  A Tablet App to Enrich the Live and Post-Live Experience of Classical Concerts , 2015, WSICC@TVX.

[9]  Jenn Riley,et al.  Variations2: retrieving and using music in an academic setting , 2006, CACM.

[10]  Meinard Müller,et al.  Audio Matching via Chroma-Based Statistical Features , 2005, ISMIR.

[11]  Florian Krebs,et al.  madmom: A New Python Audio and Music Signal Processing Library , 2016, ACM Multimedia.

[12]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[13]  Bryan Pardo,et al.  A state space model for online polyphonic audio-score alignment , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[14]  Meinard Müller,et al.  Linking Sheet Music and Audio - Challenges and New Approaches , 2012, Multimodal Music Processing.

[15]  Christopher Raphael,et al.  Music Plus One and Machine Learning , 2010, ICML.

[16]  Özgür Izmirli,et al.  Bridging Printed Music and Audio Through Alignment Using a Mid-level Score Representation , 2012, ISMIR.

[17]  Gerhard Widmer,et al.  Automatic Page Turning for Musicians via Real-Time Machine Listening , 2008, ECAI.

[18]  Nicholas Cook Performance Analysis and Chopin's Mazurkas , 2007 .

[19]  Gerhard Widmer,et al.  A Multi-pass Algorithm for Accurate Audio-to-Score Alignment , 2010, ISMIR.

[20]  Gerhard Widmer,et al.  Artificial Intelligence in the Concertgebouw , 2015, IJCAI.