The Modulation Scale Spectrum and its Application to Rhythm-Content Description

In this paper, we propose the Modulation Scale Spectrum as an extension of the Modulation Spectrum through the Scale domain. The Modulation Spectrum expresses the evolution over time of the amplitude content of various frequency bands by a second Fourier Transform. While its use has been proven for many applications, it is not scale-invariant. Because of this, we propose the use of the Scale Transform instead of the second Fourier Transform. The Scale Transform is a special case of the Mellin Transform. Among its properties is "scale-invariance". This implies that two timestretched version of a same music track will have (almost) the same Scale Spectrum. Our proposed Modulation Scale Spectrum therefore inherits from this property while describing frequency content evolution over time. We then propose a specific implementation of the Modulation Scale Spectrum in order to represent rhythm content. This representation is therefore tempo-independent. We evaluate the ability of this representation to catch rhythm characteristics on a classification task. We demonstrate that for this task our proposed representation largely exceeds results obtained so far while being highly tempo-independent.

[1]  Anssi Klapuri,et al.  Measuring the similarity of Rhythmic Patterns , 2002, ISMIR.

[2]  Ning Ma,et al.  Exploiting correlogram structure for robust speech recognition with multiple speech sources , 2007, Speech Commun..

[3]  Gaël Richard,et al.  Temporal Integration for Audio Classification With Application to Musical Instrument Classification , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  George Tzanetakis,et al.  Analyzing Afro-Cuban Rhythms using Rotation-Aware Clave Template Matching with Dynamic Programming , 2008, ISMIR.

[5]  Shingo Uchihashi,et al.  The beat spectrum: a new approach to rhythm analysis , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..

[6]  Søren Holdt Jensen,et al.  A tempo-insensitive representation of rhythmic patterns , 2009, 2009 17th European Signal Processing Conference.

[7]  Joakim Andén,et al.  Multiscale Scattering for Audio Classification , 2011, ISMIR.

[8]  Yannis Stylianou,et al.  A scale transform based method for rhythmic similarity of music , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[9]  George Tzanetakis,et al.  Musical genre classification of audio signals , 2002, IEEE Trans. Speech Audio Process..

[10]  Geoffroy Peeters Spectral and Temporal Periodicity Representations of Rhythm for the Automatic Classification of Music Audio Signal , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Davide Rocchesso,et al.  A Fast Mellin and Scale Transform , 2007, EURASIP J. Adv. Signal Process..

[12]  Gerhard Widmer,et al.  Evaluating Rhythmic descriptors for Musical Genre Classification , 2004 .

[13]  Jonathan Foote,et al.  Audio Retrieval by Rhythmic Similarity , 2002, ISMIR.

[14]  Yannis Stylianou,et al.  Scale Transform in Rhythmic Similarity of Music , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[15]  Jeroen Breebaart,et al.  Features for audio and music classification , 2003, ISMIR.

[16]  Sergios Theodoridis,et al.  Music Retrieval by Rhythmic Similarity Applied on Greek and African Traditional Music , 2007, ISMIR.

[17]  Leon Cohen,et al.  The scale representation , 1993, IEEE Trans. Signal Process..

[18]  Yannis Stylianou,et al.  Rhythmic similarity of music based on dynamic periodicity warping , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[19]  Daniel P. W. Ellis,et al.  Automatic Record Reviews , 2004, ISMIR.

[20]  Simon Dixon An Interactive Beat Tracking and Visualisation System , 2001, ICMC.

[21]  Les E. Atlas,et al.  EURASIP Journal on Applied Signal Processing 2003:7, 668–675 c ○ 2003 Hindawi Publishing Corporation Joint Acoustic and Modulation Frequency , 2003 .

[22]  Gerhard Widmer,et al.  Towards Characterisation of Music via Rhythmic Patterns , 2004, ISMIR.

[23]  Gregory H. Wakefield,et al.  Audio thumbnailing of popular music using chroma-based representations , 2005, IEEE Transactions on Multimedia.

[24]  Xavier Serra,et al.  ISMIR 2004 Audio Description Contest , 2006 .

[25]  Daniel P. W. Ellis,et al.  Beat Tracking by Dynamic Programming , 2007 .