Score Following as a Multi-Modal Reinforcement Learning Problem

Score following is the process of tracking a musical performance (audio) in a corresponding symbolic representation (score). While methods using computer-readable score representations as input are able to achieve reliable tracking results, there is little research on score following based on raw score images. In this paper, we build on previous work that formulates the score following task as a multi-modal Markov Decision Process (MDP). Given this formal definition, one can address the problem of score following with state-of-the-art deep reinforcement learning (RL) algorithms. In particular, we design end-to-end multi-modal RL agents that simultaneously learn to listen to music recordings, read the scores from images of sheet music, and follow the music along in the sheet. Using algorithms such as synchronous Advantage Actor Critic (A2C) and Proximal Policy Optimization (PPO), we reproduce and further improve existing results. We also present first experiments indicating that this approach can be extended to track real piano recordings of human performances. These audio recordings are made openly available to the research community, along with precise note-level alignment ground truth.

[1]  Pavel Pecina,et al.  The MUSCIMA++ Dataset for Handwritten Optical Music Recognition , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[2]  Peter L. Bartlett,et al.  Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning , 2001, J. Mach. Learn. Res..

[3]  Gerhard Widmer,et al.  Towards Score Following In Sheet Music Images , 2016, ISMIR.

[4]  Arshia Cont,et al.  A Coupled Duration-Focused Architecture for Real-Time Music-to-Score Alignment , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Jakob Grue Simonsen,et al.  Towards a Standard Testbed for Optical Music Recognition: Definitions, Metrics, and Page Images , 2015 .

[6]  Sergey Levine,et al.  High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.

[7]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[8]  Gerhard Widmer,et al.  Learning to Listen, Read, and Follow: Score Following as a Reinforcement Learning Game , 2018, ISMIR.

[9]  Arshia Cont Realtime Audio to Score Alignment for Polyphonic Music Instruments, using Sparse Non-Negative Constraints and Hierarchical HMMS , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[10]  Nicola Orio,et al.  Score Following: State of the Art and New Developments , 2003, NIME.

[11]  Jing Peng,et al.  Function Optimization using Connectionist Reinforcement Learning Algorithms , 1991 .

[12]  Taehoon Kim,et al.  Quantifying Generalization in Reinforcement Learning , 2018, ICML.

[13]  Kenney Ng,et al.  Interacting with Predictions: Visual Inspection of Black-box Machine Learning Models , 2016, CHI.

[14]  Simon Dixon,et al.  An On-Line Time Warping Algorithm for Tracking Musical Performances , 2005, IJCAI.

[15]  Elman Mansimov,et al.  Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation , 2017, NIPS.

[16]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[17]  Meinard Müller,et al.  Fundamentals of Music Processing , 2015, Springer International Publishing.

[18]  Nicola Orio,et al.  Robust Polyphonic Midi Score Following with Hidden Markov Models , 2004, ICMC.

[19]  Yoshua Bengio,et al.  Modeling Temporal Dependencies in High-Dimensional Sequences: Application to Polyphonic Music Generation and Transcription , 2012, ICML.

[20]  Gerhard Widmer,et al.  Automatic Page Turning for Musicians via Real-Time Machine Listening , 2008, ECAI.

[21]  Gerhard Widmer,et al.  Learning Audio-Sheet Music Correspondences for Cross-Modal Retrieval and Piece Identification , 2018, Trans. Int. Soc. Music. Inf. Retr..

[22]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[23]  Gerhard Widmer,et al.  MATCH: A Music Alignment Tool Chest , 2005, ISMIR.

[24]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[25]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[26]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[27]  Sepp Hochreiter,et al.  Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[28]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[29]  Motoaki Kawanabe,et al.  How to Explain Individual Classification Decisions , 2009, J. Mach. Learn. Res..

[30]  Meinard Müller,et al.  Matching Musical Themes based on noisy OCR and OMR input , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[31]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[32]  Youngmoo E. Kim,et al.  Orchestral Performance Companion: Using Real-Time Audio to Score Alignment , 2013, IEEE MultiMedia.

[33]  Avanti Shrikumar,et al.  Learning Important Features Through Propagating Activation Differences , 2017, ICML.

[34]  Gerhard Widmer,et al.  Getting Closer to the Essence of Music , 2016, ACM Trans. Intell. Syst. Technol..

[35]  Peter Stone,et al.  Reinforcement learning , 2019, Scholarpedia.

[36]  Christopher Raphael,et al.  Music Plus One and Machine Learning , 2010, ICML.

[37]  Gerhard Widmer,et al.  Artificial Intelligence in the Concertgebouw , 2015, IJCAI.

[38]  Pieter Abbeel,et al.  Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.

[39]  Meinard Müller,et al.  Linking Sheet Music and Audio - Challenges and New Approaches , 2012, Multimodal Music Processing.