Exploiting Instrument-wise Playing/Non-Playing Labels for Score Synchronization of Symphonic Music

Synchronization of a score to an audio-visual music performance recording is usually done by solving an audioto-MIDI alignment problem. In this paper, we focus on the possibility to represent both the score and the performance using information about which instrument is active at a given time stamp. More specifically, we investigate to what extent instrument-wise “playing” (P) and “non-playing” (NP) labels are informative in the synchronization process and what role the visual channel can have for the extraction of P/NP labels. After introducing the P/NP-based representation of the music piece, both at the score and performance level, we define an efficient way of computing the distance between the two representations, which serves as input for the synchronization step based on dynamic time warping. In parallel with assessing the effectiveness of the proposed representation, we also study its robustness when missing and/or erroneous labels occur. Our experimental results show that P/NP-based music piece representation is informative for performance-to-score synchronization and may benefit the existing audio-only approaches.

[1]  Antonello D'Aguanno,et al.  Automatic Music Synchronization Using Partial Score Representation Based on IEEE 1599 , 2009, J. Multim..

[2]  Gerhard Widmer,et al.  Automatic Alignment of Music Performances with Structural Differences , 2013, ISMIR.

[3]  Roger B. Dannenberg,et al.  Towards Reliable Partial Music Alignments Using Multiple Synchronization Strategies , 2009, Adaptive Multimedia Retrieval.

[4]  Christopher Raphael,et al.  Informed source separation of orchestra and soloist using masking and unmasking , 2010, SAPA@INTERSPEECH.

[5]  Meinard Müller,et al.  Towards an Efficient Algorithm for Automatic Score-to-Audio Synchronization , 2004, ISMIR.

[6]  Zhou Xiaofei,et al.  A rule-based intelligent multimedia streaming server system , 2008 .

[7]  Fei-Fei Li,et al.  Discovering Object Functionality , 2013, 2013 IEEE International Conference on Computer Vision.

[8]  Meinard Müller,et al.  Towards Bridging the Gap between Sheet Music and Audio , 2009, Knowledge Representation for Intelligent Music Processing.

[9]  S. Essid,et al.  Fusion of Multimodal Information in Music Content Analysis , 2012, Multimodal Music Processing.

[10]  Daniel P. W. Ellis,et al.  Ground-truth transcriptions of real music from force-aligned MIDI syntheses , 2003, ISMIR.

[11]  Christopher Raphael,et al.  Music score alignment and computer accompaniment , 2006, CACM.

[12]  Meinard Müller,et al.  Sheet Music-Audio Identification , 2009, ISMIR.

[13]  M. Reinders,et al.  Multi-Dimensional Dynamic Time Warping for Gesture Recognition , 2007 .

[14]  Douglas Eck,et al.  The need for music information retrieval with user-centered and multimodal strategies , 2011, MIRUM '11.

[15]  Meinard Müller,et al.  Using score-informed constraints for NMF-based source separation , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).