A state space model for online polyphonic audio-score alignment

We present a novel online audio-score alignment approach for multi-instrument polyphonic music. This approach uses a 2-dimensional state vector to model the underlying score position and tempo of each time frame of the audio performance. The process model is defined by dynamic equations to transition between states. Two representations of the observed audio frame are proposed, resulting in two observation models: a multi-pitch-based and a chroma-based. Particle filtering is used to infer the hidden states from observations. Experiments on 150 music pieces with polyphony from one to four show the proposed approach outperforms an existing offline global string alignment-based score alignment approach. Results also show that the multi-pitch-based observation model works better than the chroma-based one.

[1]  Christopher Raphael,et al.  Evaluation of Real-Time Audio-to-Score Alignment , 2007, ISMIR.

[2]  Nicola Orio,et al.  Score Following Using Spectral Analysis and Hidden Markov Models , 2001, ICMC.

[3]  Timothy J. Robinson,et al.  Sequential Monte Carlo Methods in Practice , 2003 .

[4]  Peter Grosche,et al.  High resolution audio synchronization using chroma onset features , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[5]  Changshui Zhang,et al.  Multiple Fundamental Frequency Estimation by Modeling Spectral Peaks and Non-Peak Regions , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  Roger B. Dannenberg,et al.  An On-Line Algorithm for Real-Time Accompaniment , 1984, ICMC.

[7]  Christopher Raphael A Bayesian Network for Real-Time Musical Accompaniment , 2001, NIPS.

[8]  Simon Dixon,et al.  LIVE TRACKING OF MUSICAL PERFORMANCES USING ON-LINE TIME WARPING , 2005 .

[9]  Arshia Cont,et al.  A Coupled Duration-Focused Architecture for Real-Time Music-to-Score Alignment , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Roger B. Dannenberg,et al.  Real-Time Computer Accompaniment of Keyboard Performances , 1985, ICMC.

[11]  Arshia Cont Realtime Audio to Score Alignment for Polyphonic Music Instruments, using Sparse Non-Negative Constraints and Hierarchical HMMS , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[12]  Miller Puckette,et al.  Score Following Using the Sung Voice , 1995, ICMC.

[13]  Barry Vercoe,et al.  The Synthetic Performer in The Context of Live Performance , 1984, International Conference on Mathematics and Computing.

[14]  Roger B. Dannenberg,et al.  A Stochastic Method of Tracking a Vocal Performer , 1997, ICMC.

[15]  Roger B. Dannenberg,et al.  Automated Accompaniment of Musical Ensembles , 1994, AAAI.

[16]  Gaël Richard,et al.  A comparative study of tonal acoustic features for a symbolic level music-to-score alignment , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[17]  George Tzanetakis,et al.  Polyphonic audio matching and alignment for music retrieval , 2003, 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (IEEE Cat. No.03TH8684).

[18]  Michael Clausen,et al.  Slave: A Score-Lyrics-Audio-Video-Explorer , 2009, ISMIR.