Dynamic texture models of music

In this paper, we consider representing a musical signal as a dynamic texture, a model for both the timbral and rhythmical qualities of sound. We apply the new representation to the task of automatic song segmentation. In particular, we cluster sequences of audio feature-vectors, extracted from the song, using a dynamic texture mixture model (DTM). We show that the DTM model can both detect transition boundaries and accurately cluster coherent segments. The similarities between the dynamic textures which define these segments are based on both timbral and rhythmic qualities of the music, indicating that the DTM model simultaneously captures two of the important aspects required for automatic music analysis.

[1]  Mark B. Sandler,et al.  Using duration models to reduce fragmentation in audio segmentation , 2006, Machine Learning.

[2]  Antoni B. Chan,et al.  Audio Information Retrieval using Semantic Similarity , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[3]  R. Shumway,et al.  AN APPROACH TO TIME SERIES SMOOTHING AND FORECASTING USING THE EM ALGORITHM , 1982 .

[4]  Mark B. Sandler,et al.  Extraction of High-Level Musical Structure From Audio Data and Its Application to Thumbnail Generation , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[5]  Mark B. Sandler,et al.  Structural Segmentation of Musical Audio by Constrained Clustering , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  Jianghong Li,et al.  Dynamic Texture Segmentation Using Fourier Transform , 2009 .

[7]  François Pachet,et al.  "The way it Sounds": timbre models for analysis and retrieval of music signals , 2005, IEEE Transactions on Multimedia.

[8]  Bart De Moor,et al.  N4SID: Subspace algorithms for the identification of combined deterministic-stochastic systems , 1994, Autom..

[9]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[10]  Masataka Goto,et al.  A chorus section detection method for musical audio signals and its application to a music listening station , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Jonathan Foote,et al.  Visualizing music and audio using self-similarity , 1999, MULTIMEDIA '99.

[12]  Stefano Soatto,et al.  Dynamic Textures , 2003, International Journal of Computer Vision.

[13]  Masataka Goto,et al.  AIST Annotation for the RWC Music Database , 2006, ISMIR.

[14]  L. Hubert,et al.  Comparing partitions , 1985 .

[15]  Masataka Goto,et al.  A Supervised Approach for Detecting Boundaries in Music Using Difference Features and Boosting , 2007, ISMIR.

[16]  Masataka Goto,et al.  Development of the RWC Music Database , 2004 .

[17]  Nuno Vasconcelos,et al.  Modeling, Clustering, and Segmenting Video with Mixtures of Dynamic Textures , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Nuno Vasconcelos,et al.  Probabilistic kernels for the classification of auto-regressive visual processes , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[19]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[20]  Payam Saisan,et al.  Dynamic texture recognition , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[21]  Chin-Hui Lee,et al.  A Study on Music Genre Classification Based on Universal Acoustic Models , 2006, ISMIR.

[22]  Gert R. G. Lanckriet,et al.  Semantic Annotation and Retrieval of Music and Sound Effects , 2008, IEEE Transactions on Audio, Speech, and Language Processing.