Alignment of Monophonic and Polyphonic Music to a Score

Music alignment is the association of events in a score with points in the time axis of an audio signal. The signal is thus segmented according to the events in the score. We propose a new methodology for automatic alignment based on dynamic time warping, where the spectr al peak structure is used to compute the local distance, enhanced by a model of attacks and of silence. The methodology can cope with performances consider ed difficult to align, like polyphonic music, trills, fast sequences, or multi-instrument music. An optimisation of the representation of the alignment path makes the method applicable to long sound files, so that unit databases can be fully automatically segmented and labeled. On 708 sequences of synthesised music, we achieved an averag e offset of 25 ms and an error rate of 2.5%.