Structural Segmentation of Musical Audio by Constrained Clustering

We describe a method of segmenting musical audio into structural sections based on a hierarchical labeling of spectral features. Frames of audio are first labeled as belonging to one of a number of discrete states using a hidden Markov model trained on the features. Histograms of neighboring frames are then clustered into segment-types representing distinct distributions of states, using a clustering algorithm in which temporal continuity is expressed as a set of constraints modeled by a hidden Markov random field. We give experimental results which show that in many cases the resulting segmentations correspond well to conventional notions of musical form. We show further how the constrained clustering approach can easily be extended to include prior musical knowledge, input from other machine approaches, or semi-supervision.

[1]  J. M. Hammersley,et al.  Markov fields on finite graphs and lattices , 1971 .

[2]  J. Besag On the Statistical Analysis of Dirty Pictures , 1986 .

[3]  Theodore Gracyk,et al.  Rock, the primary text , 1993 .

[4]  Joachim M. Buhmann,et al.  Pairwise Data Clustering by Deterministic Annealing , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Joachim M. Buhmann,et al.  Histogram clustering for unsupervised image segmentation , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[6]  Jonathan Foote,et al.  Visualizing music and audio using self-similarity , 1999, MULTIMEDIA '99.

[7]  K. Johansson The Harmonic Language of the Beatles KG Johansson Luleå , 1999 .

[8]  Jonathan Foote,et al.  Automatic audio segmentation using a measure of audio novelty , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[9]  Mark Sandler,et al.  Segmentation of Musical Signals Using Hidden Markov Models. , 2001 .

[10]  Michael A. Casey,et al.  General sound classification and similarity in MPEG-7 , 2001, Organised Sound.

[11]  Barry Vercoe,et al.  Music thumbnailing via structural analysis , 2003, ACM Multimedia.

[12]  Geoffroy Peeters Deriving Musical Structures from Signal Analysis for Music Audio Summary Generation: "Sequence" and "State" Approach , 2003, CMMR.

[13]  Masataka Goto,et al.  A chorus-section detecting method for musical audio signals , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[14]  Mohan S. Kankanhalli,et al.  Content-based music structure analysis with applications to music semantics understanding , 2004, MULTIMEDIA '04.

[15]  Raymond J. Mooney,et al.  A probabilistic framework for semi-supervised clustering , 2004, KDD.

[16]  Lie Lu,et al.  Repeating pattern discovery and structure analysis from acoustic music data , 2004, MIR '04.

[17]  Gregory H. Wakefield,et al.  Audio thumbnailing of popular music using chroma-based representations , 2005, IEEE Transactions on Multimedia.

[18]  Joachim M. Buhmann,et al.  Learning with constrained and unlabelled data , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[19]  Mark B. Sandler,et al.  Theory and Evaluation of a Bayesian Music Structure Extractor , 2005, ISMIR.

[20]  Perfecto Herrera,et al.  Semantic Segmentation of Music audio Contents , 2005, ICMC.

[21]  Matthew E. P. Davies,et al.  BEAT TRACKING WITH A TWO STATE MODEL , 2005 .

[22]  Barry Vercoe,et al.  Automated analysis of musical structure , 2005 .

[23]  Mark B. Sandler,et al.  Using duration models to reduce fragmentation in audio segmentation , 2006, Machine Learning.

[24]  A. Klapuri,et al.  Music structure analysis by finding repeated parts , 2006, AMCMM '06.

[25]  Mark B. Sandler,et al.  New methods in structural segmentation of musical audio , 2006, 2006 14th European Signal Processing Conference.

[26]  Mark B. Sandler,et al.  Extraction of High-Level Musical Structure From Audio Data and Its Application to Thumbnail Generation , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[27]  C.-C. Jay Kuo,et al.  Similarity matrix processing for music structure analysis , 2006, AMCMM '06.

[28]  Gaël Richard,et al.  On the Correlation of Automatic Audio and Visual Segmentations of Music Videos , 2007, IEEE Transactions on Circuits and Systems for Video Technology.