Bayesian Audio-to-Score Alignment Based on Joint Inference of Timbre, Volume, Tempo, and Note Onset Timings

This article presents an offline method for aligning an audio signal to individual instrumental parts constituting a musical score. The proposed method is based on fitting multiple hidden semi-Markov models (HSMMs) to the observed audio signal. The emission probability of each state of the HSMM is described using latent harmonic allocation (LHA), a Bayesian model of a harmonic sound mixture. Each HSMM corresponds to one musical instrument’s part, and the state duration probability is conditioned on a linear dynamics system (LDS) tempo model. Variational Bayesian inference is used to jointly infer LHA, HSMM, and the LDS. We evaluate the capability of the method to align musical audio to its score, under reverberation, structural variations, and fluctuations in onset timing among different parts.

[1]  Bryan Pardo,et al.  A state space model for online polyphonic audio-score alignment , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Gerhard Widmer,et al.  A Multi-pass Algorithm for Accurate Audio-to-Score Alignment , 2010, ISMIR.

[3]  Simon Dixon,et al.  Accurate Real-time Windowed Time Warping , 2010, ISMIR.

[4]  Meinard Müller,et al.  Towards Timbre-Invariant Audio Features for Harmony-Based Music , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  Gerhard Widmer,et al.  Automatic Page Turning for Musicians via Real-Time Machine Listening , 2008, ECAI.

[6]  Bryan Pardo,et al.  Soundprism: An Online System for Score-Informed Source Separation of Music Audio , 2011, IEEE Journal of Selected Topics in Signal Processing.

[7]  George Tzanetakis,et al.  Polyphonic audio matching and alignment for music retrieval , 2003, 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (IEEE Cat. No.03TH8684).

[8]  Meinard Müller,et al.  Using score-informed constraints for NMF-based source separation , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  Meinard Müller,et al.  Score-Informed Voice Separation For Piano Recordings , 2011, ISMIR.

[10]  Katsutoshi Itoyama,et al.  Nonparametric Bayesian Dereverberation of Power Spectrograms Based on Infinite-Order Autoregressive Processes , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[11]  Tetsuya Ogata,et al.  Real-Time Audio-to-Score Alignment Using Particle Filter for Coplayer Music Robots , 2011, EURASIP J. Adv. Signal Process..

[12]  Masataka Goto,et al.  Polyphonic audio-to-score alignment based on Bayesian Latent Harmonic Allocation Hidden Markov Model , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[13]  Masataka Goto,et al.  Query-by-conducting: An Interface to Retrieve Classical-music Interpretations by Real-time Tempo Input , 2010, ISMIR.

[14]  Meinard Müller,et al.  Joint Structure Analysis with Applications to Music Annotation and Synchronization , 2008, ISMIR.

[15]  Ron J. Weiss,et al.  Exploring common variations in state of the art chord recognition systems , 2010 .

[16]  Gerhard Widmer,et al.  Evidence for Pianist-specific Rubato Style in Chopin Nocturnes , 2010, ISMIR.

[17]  Peter Grosche,et al.  High resolution audio synchronization using chroma onset features , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[18]  Gautham J. Mysore,et al.  An Efficient Posterior Regularized Latent Variable Model for Interactive Sound Source Separation , 2013, ICML.

[19]  Meinard Müller,et al.  A Demonstration of the SyncPlayer System , 2007, ISMIR.

[20]  Meinard Müller,et al.  Sheet Music-Audio Identification , 2009, ISMIR.

[21]  Masataka Goto,et al.  Integration and Adaptation of Harmonic and Inharmonic Models for Separating Polyphonic Musical Signals , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[22]  Meinard Müller,et al.  Enhancing Similarity Matrices for Music Audio Analysis , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[23]  Craig Stuart Sapp Comparative Analysis of Multiple Musical Performances , 2007, ISMIR.

[24]  Masataka Goto,et al.  A Nonparametric Bayesian Multipitch Analyzer Based on Infinite Latent Harmonic Allocation , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[25]  Arshia Cont,et al.  A Coupled Duration-Focused Architecture for Real-Time Music-to-Score Alignment , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Masataka Goto,et al.  Development of the RWC Music Database , 2004 .

[27]  Takuya Fujishima,et al.  Realtime Chord Recognition of Musical Sound: a System Using Common Lisp Music , 1999, ICMC.

[28]  Christopher Raphael,et al.  Desoloing Monaural Audio Using Mixture Models , 2007, ISMIR.

[29]  Simon J. Godsill,et al.  A Probabilistic Framework for Matching Music Representations , 2007, ISMIR.

[30]  Daniel P. W. Ellis,et al.  Handling Asynchrony in Audio-Score Alignment , 2009, ICMC.

[31]  S. Essid,et al.  An Improved Hierarchical Approach for Music-to-symbolic Score Alignment , 2010, ISMIR.

[32]  Nicola Orio,et al.  Score Following: State of the Art and New Developments , 2003, NIME.

[33]  Christopher Raphael,et al.  A Hybrid Graphical Model for Aligning Polyphonic Audio with Musical Scores , 2004, ISMIR.

[34]  Arshia Cont,et al.  A unified approach to real time audio-to-score and audio-to-audio alignment using sequential Montecarlo inference techniques , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).