Adaptive Temporal Modeling of Audio Features in the Context of Music Structure Segmentation

This paper describes a method for automatically adapting the length of the temporal modeling applied to audio features in the context of music structure segmentation. By detecting regions of homogeneous acoustical content and abrupt changes in the audio feature sequence, we show that we can consequently adapt temporal modeling to capture both fast- and slow- varying structural information in the audio signal. Evaluation of the method shows that temporal modeling is consistently adapted to different musical contexts, allowing for robust music structure segmentation while gaining independence regarding parameter tuning.

[1]  Ernesto Damiani,et al.  Combining multi-probe histogram and order-statistics based LSH for scalable audio content retrieval , 2010, ACM Multimedia.

[2]  Xavier Rodet,et al.  Toward Automatic Music Audio Summary Generation from Signal Analysis , 2002, ISMIR.

[3]  A. L. Jacobson,et al.  Auto-threshold peak detection in physiological signals , 2001, 2001 Conference Proceedings of the 23rd Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[4]  Meinard Müller,et al.  Audio-based Music Structure Analysis , 2010 .

[5]  Jonathan Foote,et al.  Visualizing music and audio using self-similarity , 1999, MULTIMEDIA '99.

[6]  Thomas Sikora,et al.  Music Structure Discovery in Popular Music using Non-negative Matrix Factorization , 2010, ISMIR.

[7]  Anssi Klapuri,et al.  State of the Art Report: Audio-Based Music Structure Analysis , 2010, ISMIR.

[8]  Meinard Müller,et al.  Enhancing Similarity Matrices for Music Audio Analysis , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[9]  Emmanuel Vincent,et al.  A Regularity-Constrained Viterbi Algorithm and Its Application to The Structural Segmentation of Songs , 2011, ISMIR.

[10]  Jonathan Foote,et al.  Automatic audio segmentation using a measure of audio novelty , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[11]  Antoni B. Chan,et al.  Modeling Music as a Dynamic Texture , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[12]  Mark B. Sandler,et al.  Structural Segmentation of Musical Audio by Constrained Clustering , 2008, IEEE Transactions on Audio, Speech, and Language Processing.