Towards Structural Analysis of Audio Recordings in the Presence of Musical Variations

One major goal of structural analysis of an audio recording is to automatically extract the repetitive structure or, more generally, the musical form of the underlying piece of music. Recent approaches to this problem work well for music, where the repetitions largely agree with respect to instrumentation and tempo, as is typically the case for popular music. For other classes of music such as Western classical music, however, musically similar audio segments may exhibit significant variations in parameters such as dynamics, timbre, execution of note groups, modulation, articulation, and tempo progression. In this paper, we propose a robust and efficient algorithm for audio structure analysis, which allows to identify musically similar segments even in the presence of large variations in these parameters. To account for such variations, our main idea is to incorporate invariance at various levels simultaneously: we design a new type of statistical features to absorb microvariations, introduce an enhanced local distance measure to account for local variations, and describe a new strategy for structure extraction that can cope with the global variations. Our experimental results with classical and popular music show that our algorithm performs successfully even in the presence of significant musical variations.

[1]  Jonathan Foote,et al.  Visualizing music and audio using self-similarity , 1999, MULTIMEDIA '99.

[2]  Wei Chai,et al.  Structural analysis of musical signals via pattern matching , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[3]  George Tzanetakis,et al.  Pitch Histograms in Audio and Symbolic Music Information Retrieval , 2003, ISMIR.

[4]  Lie Lu,et al.  Repeating pattern discovery and structure analysis from acoustic music data , 2004, MIR '04.

[5]  Xavier Rodet,et al.  Toward Automatic Music Audio Summary Generation from Signal Analysis , 2002, ISMIR.

[6]  Changsheng Xu,et al.  Automatic music classification and summarization , 2005, IEEE Transactions on Speech and Audio Processing.

[7]  Wei Chai Structural analysis of musical signals via pattern matching , 2003, 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (IEEE Cat. No.03TH8684).

[8]  Jonathan Foote,et al.  Automatic Music Summarization via Similarity Analysis , 2002, ISMIR.

[9]  Beth Logan,et al.  Music summarization using key phrases , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[10]  Masataka Goto,et al.  SmartMusicKIOSK: music listening station with chorus-search function , 2003, UIST '03.

[11]  G. H. Wakefield,et al.  To catch a chorus: using chroma-based representations for audio thumbnailing , 2001, Proceedings of the 2001 IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics (Cat. No.01TH8575).

[12]  Meinard Müller,et al.  Syncplayer - An Advanced System for Multimodal Music Access , 2005, ISMIR.

[13]  Ning Hu,et al.  Pattern Discovery Techniques for Music Audio , 2002, ISMIR.

[14]  Masataka Goto,et al.  A chorus-section detecting method for musical audio signals , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[15]  Barry Vercoe,et al.  Music thumbnailing via structural analysis , 2003, ACM Multimedia.

[16]  Meinard Müller,et al.  Audio Matching via Chroma-Based Statistical Features , 2005, ISMIR.

[17]  Gregory H. Wakefield,et al.  Audio thumbnailing of popular music using chroma-based representations , 2005, IEEE Transactions on Multimedia.

[18]  Mohan S. Kankanhalli,et al.  Content-based music structure analysis with applications to music semantics understanding , 2004, MULTIMEDIA '04.