论文信息 - Transposing Chroma Representations to a Common Key

Transposing Chroma Representations to a Common Key

Abstract—Chroma representations of musical excerpts arenowadays very popular and used for a wide variety of appli-cations. Within these, transposition to a common key or tona litycan represent an important aspect, usually having a dramaticimpact to ﬁnal system’s accuracy. We present and evaluatea new and straightforward way for transposing two chromarepresentations to a common key that outperforms previousmethods based on key estimation and that, without detrimentof accuracy, is computationally faster than trying all possibletranspositions. In addition, we also provide some insights intothe internal organization of this new tool, suggesting it wouldorganize transposition indices in a coherent manner.Index Terms—Music, Information retrieval, Acoustic signalanalysis, Multidimensional sequences, Symbol manipulation I. I NTRODUCTION T RANSPOSING musical excerpts to a common key ortonality is a necessary feature when comparing melodies,harmonies or any tonal representation of these musical ex-cerpts. This process is specially crucial in many music in-formation retrieval (MIR) tasks related to music similaritysuch as audio matching and alignment [1], [2], song structureanalysis [3] or cover song identiﬁcation [4], [5], where mel odicor harmonic representations of musical excerpts are used.Furthermore, this is a necessary feature for any music retrievalor recommendation engine comparing tonal information.Chroma features or pitch class proﬁles (PCP) have becomevery popular and widely used among these and many otherMIR-related tasks (e.g. key and chord estimation [6], [7])as they provide a description of the audio tonal contentthat, ideally [8], (a) represents the pitch class distribution ofboth monophonic and polyphonic signals, (b) considers thepresence of harmonic frequencies, (c) is robust to noise andnon-tonal sounds, (d) is independent of timbre and playedinstrument, (e) is independent of loudness and dynamics and(f) is independent of tuning, so that the reference frequencycan be slightly different from the standard A 440 Hz. Chromafeatures (ﬁgure 1) are derived from the energy found within agiven frequency range (typically from 50 to 5000 Hz) in short-time spectral representations (e.g. 100 msec) of audio signalsextracted on a frame-by-frame basis. This energy is usuallycollapsed into an octave-independent histogram representingthe presence (or relative intensity) of each of the 12 semitonesof an equal-tempered chromatic scale.

[1] Emilia Gómez,et al. Automatic Extraction of Musical Structure Using Pitch Class Distribution Features , 2006 .

[2] Emilia Gómez,et al. Audio cover song identification based on tonal sequence alignment , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[3] D. Temperley. The Cognition of Basic Musical Structures , 2001 .

[4] Juan Pablo Bello,et al. Audio-Based Cover Song Retrieval Using Approximate Chord Sequences: Testing Shifts, Gaps, Swaps and Beats , 2007, ISMIR.

[5] Emilia Gómez,et al. Estimating The Tonality Of Polyphonic Audio Files: Cognitive Versus Machine Learning Modelling Strategies , 2004, ISMIR.

[6] Daniel P. W. Ellis,et al. Chord Recognition and Segmentation Using EM-trained Hidden Markov Models , 2003 .

[7] Meinard Müller,et al. Audio Matching via Chroma-Based Statistical Features , 2005, ISMIR.

[8] George Tzanetakis,et al. Polyphonic audio matching and alignment for music retrieval , 2003, 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (IEEE Cat. No.03TH8684).

[9] Xavier Serra,et al. Chroma Binary Similarity and Local Alignment Applied to Cover Song Identification , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[10] Daniel P. W. Ellis,et al. Identifying `Cover Songs' with Chroma Features and Dynamic Programming Beat Tracking , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[11] Emilia Gómez Gutiérrez,et al. Tonal description of music audio signals , 2006 .

[12] Matija Marolt,et al. A Mid-level Melody-based Representation for Calculating Audio Similarity , 2006, ISMIR.

[13] Robert O. Gjerdingen,et al. The Cognition of Basic Musical Structures , 2004 .

[14] Özgür Izmirli,et al. Tonal Similarity from Audio Using a Template Based Attractor Model , 2005, ISMIR.