Audio-to-Score Alignment using Transposition-invariant Features

Audio-to-score alignment is an important pre-processing step for in-depth analysis of classical music. In this paper, we apply novel transposition-invariant audio features to this task. These low-dimensional features represent local pitch intervals and are learned in an unsupervised fashion by a gated autoencoder. Our results show that the proposed features are indeed fully transposition-invariant and enable accurate alignments between transposed scores and performances. Furthermore, they can even outperform widely used features for audio-to-score alignment on `untransposed data', and thus are a viable and more flexible alternative to well-established features for music alignment and matching.

[1]  Gerhard Widmer,et al.  Automatic Alignment of Music Performances with Structural Differences , 2013, ISMIR.

[2]  Gerhard Widmer,et al.  MATCH: A Music Alignment Tool Chest , 2005, ISMIR.

[3]  Gaël Richard,et al.  A comparative study of tonal acoustic features for a symbolic level music-to-score alignment , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[4]  Roland Memisevic Gradient-based learning of higher-order features , 2011 .

[5]  Gerhard Widmer,et al.  Adaptive distance normalization for real-time music tracking , 2012, 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO).

[6]  Gerhard Widmer,et al.  Discovering simple rules in complex data: A meta-learning algorithm and some surprising musical discoveries , 2003, Artif. Intell..

[7]  Christopher Raphael,et al.  A Hybrid Graphical Model for Aligning Polyphonic Audio with Musical Scores , 2004, ISMIR.

[8]  Arshia Cont,et al.  A unified approach to real time audio-to-score and audio-to-audio alignment using sequential Montecarlo inference techniques , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  Gerhard Widmer,et al.  Toward Computer-Assisted Understanding of Dynamics in Symphonic Music , 2016, IEEE MultiMedia.

[10]  Colin Raffel,et al.  librosa: Audio and Music Signal Analysis in Python , 2015, SciPy.

[11]  Yasuyuki Saito,et al.  Outer-Product Hidden Markov Model and Polyphonic MIDI Score Following , 2014, ArXiv.

[12]  A. Arzt SIMPLE TEMPO MODELS FOR REAL-TIME MUSIC TRACKING , 2010 .

[13]  Xavier Serra,et al.  Composition identification in Ottoman-Turkish makam music using transposition-invariant partial audio-score alignment , 2016 .

[14]  Gerhard Widmer,et al.  In Search of the Horowitz Factor , 2003, AI Mag..

[15]  Nicola Orio,et al.  Score Following Using Spectral Analysis and Hidden Markov Models , 2001, ICMC.

[16]  Roland Badeau,et al.  Multipitch Estimation of Piano Sounds Using a New Probabilistic Spectral Smoothness Principle , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[17]  Xavier Serra,et al.  Linking Scores and Audio Recordings in Makam Music of Turkey , 2014 .

[18]  C. Joder,et al.  A Conditional Random Field Framework for Robust and Scalable Audio-to-Score Matching , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[19]  Gerhard Widmer,et al.  Learning Transposition-Invariant Interval Features from Symbolic Music and Audio , 2018, ArXiv.

[20]  Bryan Pardo,et al.  A state space model for online polyphonic audio-score alignment , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[21]  Katsutoshi Itoyama,et al.  BAYESIAN AUDIO ALIGNMENT BASED ON A UNIFIED GENERATIVE MODEL OF MUSIC COMPOSITION AND PERFORMANCE , 2014 .

[22]  Meinard Müller,et al.  Fundamentals of Music Processing , 2015, Springer International Publishing.

[23]  Christopher Raphael,et al.  Automatic Segmentation of Acoustic Musical Signals Using Hidden Markov Models , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[24]  Gerhard Widmer,et al.  Fast Identification of Piece and Score Position via Symbolic Fingerprinting , 2012, ISMIR.

[25]  Verena Kriesel,et al.  Music Synchronization, Audio Matching, Pattern Detection, and User Interfaces for a Digital Music Library System , 2013 .

[26]  Kjell Lemström,et al.  Transposition and time-warp invariant geometric music retrieval algorithms , 2011, 2011 IEEE International Conference on Multimedia and Expo.

[27]  George Tzanetakis,et al.  Polyphonic audio matching and alignment for music retrieval , 2003, 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (IEEE Cat. No.03TH8684).

[28]  Simon Dixon,et al.  Robust Joint Alignment of Multiple Versions of a Piece of Music , 2016, ISMIR.

[29]  Daniel P. W. Ellis,et al.  Identifying `Cover Songs' with Chroma Features and Dynamic Programming Beat Tracking , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[30]  Meinard Müller,et al.  Transposition-Invariant Self-Similarity Matrices , 2007, ISMIR.

[31]  Christopher Raphael,et al.  A Probabilistic Expert System for Automatic Musical Accompaniment , 2001 .

[32]  Stan Salvador,et al.  FastDTW: Toward Accurate Dynamic Time Warping in Linear Time and Space , 2004 .

[33]  Meinard Müller,et al.  A Demonstration of the SyncPlayer System , 2007, ISMIR.

[34]  Honglak Lee,et al.  Sparse deep belief net model for visual area V2 , 2007, NIPS.

[35]  Richard F. Lyon,et al.  The Intervalgram: An Audio Feature for Large-Scale Cover-Song Recognition , 2012, CMMR.

[36]  Arshia Cont,et al.  A Coupled Duration-Focused Architecture for Real-Time Music-to-Score Alignment , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Matija Marolt,et al.  A Mid-Level Representation for Melody-Based Retrieval in Audio Collections , 2008, IEEE Transactions on Multimedia.