A Robust Mid-Level Representation for Harmonic Content in Music Signals

When considering the problem of audio-to-audio matching, determining musical similarity using low-level features such as Fourier transforms and MFCCs is an extremely difficult task, as there is little semantic information available. Full semantic transcription of audio is an unreliable and imperfect task in the best case, an unsolved problem in the worst. To this end we propose a robust mid-level representation that incorporates both harmonic and rhythmic information, without attempting full transcription. We describe a process for creating this representation automatically, directly from multi-timbral and polyphonic music signals, with an emphasis on popular music. We also offer various evaluations of our techniques. Moreso than most approaches working from raw audio, we incorporate musical knowledge into our assumptions, our models, and our processes. Our hope is that by utilizing this notion of a musically-motivated mid-level representation we may help bridge the gap between symbolic and audio research.

[1]  Daniel P. W. Ellis,et al.  Chord segmentation and recognition using EM-trained hidden markov models , 2003, ISMIR.

[2]  Emilia Gómez,et al.  Estimating The Tonality Of Polyphonic Audio Files: Cognitive Versus Machine Learning Modelling Strategies , 2004, ISMIR.

[3]  Judith C. Brown Calculation of a constant Q spectral transform , 1991 .

[4]  Hoon Kim,et al.  Monte Carlo Statistical Methods , 2000, Technometrics.

[5]  Gerhard Widmer,et al.  Towards Characterisation of Music via Rhythmic Patterns , 2004, ISMIR.

[6]  Cheng Yang MACSIS: A Scalable Acoustic Index for Content-Based Music Retrieval , 2002, ISMIR.

[7]  Steffen Pauws,et al.  Musical key extraction from audio , 2004, ISMIR.

[8]  Christopher Raphael,et al.  Harmonic analysis with probabilistic graphical models , 2003, ISMIR.

[9]  Joachim M. Buhmann,et al.  Histogram clustering for unsupervised image segmentation , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[10]  Mark B. Sandler,et al.  Theory and Evaluation of a Bayesian Music Structure Extractor , 2005, ISMIR.

[11]  Matthew E. P. Davies,et al.  BEAT TRACKING WITH A TWO STATE MODEL , 2005 .

[12]  François Pachet,et al.  Music Similarity Measures: What's the use? , 2002, ISMIR.

[13]  Mark Sandler,et al.  Automatic Chord Identifcation using a Quantised Chromagram , 2005 .

[14]  G. H. Wakefield,et al.  To catch a chorus: using chroma-based representations for audio thumbnailing , 2001, Proceedings of the 2001 IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics (Cat. No.01TH8575).

[15]  Mark B. Sandler,et al.  Polyphonic Score Retrieval Using Polyphonic Audio Queries: A Harmonic Modeling Approach , 2003, ISMIR.