Chroma-based statistical audio features for audio matching

Large music collections often contain several recordings of the same piece of music, which are interpreted by various musicians and possibly arranged in different instrumentations. Given a short query audio clip, an important task in audio retrieval is to automatically and efficiently identify all corresponding audio clips irrespective of the specific interpretation or instrumentation. In view of this problem, which is also referred to as audio matching, the main contribution of this paper is to introduce a new type of audio feature that strongly correlates to the harmonic progression of the audio signal. In addition, our feature shows a high degree of robustness to variations in parameters such as dynamics, timbre, articulation, and local tempo deviations. The feature design is carried out in two stages basically taking short-time statistics over chroma-based energy distributions. Here, the chroma correspond to the 12 traditional pitch classes of the equal-tempered scale. Applied to audio matching on a large audio database consisting of a wide range of classical music (112 hours of audio material), our features proved to be a powerful tool providing accurate matchings in an efficient way concerning time as well as memory requirements.

[1]  Gregory H. Wakefield,et al.  Audio thumbnailing of popular music using chroma-based representations , 2005, IEEE Transactions on Multimedia.

[2]  George Tzanetakis,et al.  Pitch Histograms in Audio and Symbolic Music Information Retrieval , 2003, ISMIR.

[3]  Jürgen Herre,et al.  AudioID: Towards Content-Based Identification of Audio Material , 2001 .

[4]  Avery Wang,et al.  An Industrial Strength Audio Search Algorithm , 2003, ISMIR.

[5]  George Tzanetakis,et al.  Polyphonic audio matching and alignment for music retrieval , 2003, 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (IEEE Cat. No.03TH8684).