Using duration models to reduce fragmentation in audio segmentation

We investigate explicit segment duration models in addressing the problem of fragmentation in musical audio segmentation. The resulting probabilistic models are optimised using Markov Chain Monte Carlo methods; in particular, we introduce a modification to Wolff’s algorithm to make it applicable to a segment classification model with an arbitrary duration prior. We apply this to a collection of pop songs, and show experimentally that the generated segmentations suffer much less from fragmentation than those produced by segmentation algorithms based on clustering, and are closer to an expert listener’s annotations, as evaluated by two different performance measures.

[1]  Y. Shoham Reasoning About Change: Time and Causation from the Standpoint of Artificial Intelligence , 1987 .

[2]  Ning Hu,et al.  Discovering Musical Structure in Audio Recordings , 2002, ICMAI.

[3]  James F. Allen Towards a General Theory of Action and Time , 1984, Artif. Intell..

[4]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[5]  Mark B. Sandler,et al.  A tutorial on onset detection in music signals , 2005, IEEE Transactions on Speech and Audio Processing.

[6]  Qian Huang,et al.  Quantitative methods of evaluating image segmentation , 1995, Proceedings., International Conference on Image Processing.

[7]  Jonathan Foote,et al.  Visualizing music and audio using self-similarity , 1999, MULTIMEDIA '99.

[8]  Wolff,et al.  Collective Monte Carlo updating for spin systems. , 1989, Physical review letters.

[9]  Beth Logan,et al.  Mel Frequency Cepstral Coefficients for Music Modeling , 2000, ISMIR.

[10]  Mohan S. Kankanhalli,et al.  Content-based music structure analysis with applications to music semantics understanding , 2004, MULTIMEDIA '04.

[11]  Nicola Orio,et al.  Experiments on Segmentation Techniques for Music Documents Indexing , 2005, ISMIR.

[12]  Gregory H. Wakefield,et al.  Mathematical representation of joint time-chroma distributions , 1999, Optics & Photonics.

[13]  Wang,et al.  Nonuniversal critical dynamics in Monte Carlo simulations. , 1987, Physical review letters.

[14]  Christian P. Robert,et al.  Monte Carlo Statistical Methods , 2005, Springer Texts in Statistics.

[15]  D. Ruelle,et al.  Recurrence Plots of Dynamical Systems , 1987 .

[16]  Malcolm D. Macleod,et al.  Onset Detection in Musical Audio Signals , 2003, ICMC.

[17]  Mark B. Sandler,et al.  Theory and Evaluation of a Bayesian Music Structure Extractor , 2005, ISMIR.

[18]  Beth Logan,et al.  Music summarization using key phrases , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[19]  Tom E. Bishop,et al.  Blind Image Restoration Using a Block-Stationary Signal Model , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[20]  François Pachet,et al.  "The way it Sounds": timbre models for analysis and retrieval of music signals , 2005, IEEE Transactions on Multimedia.

[21]  J. Stephen Downie,et al.  Evaluation of a simple and effective music information retrieval method , 2000, SIGIR '00.

[22]  Lie Lu,et al.  Repeating pattern discovery and structure analysis from acoustic music data , 2004, MIR '04.

[23]  Xavier Rodet,et al.  Toward Automatic Music Audio Summary Generation from Signal Analysis , 2002, ISMIR.

[24]  Chin-Hui Lee,et al.  On the asymptotic statistical behavior of empirical cepstral coefficients , 1993, IEEE Trans. Signal Process..

[25]  Christian P. Robert,et al.  Monte Carlo Statistical Methods (Springer Texts in Statistics) , 2005 .

[26]  M. Goto A chorus-section detecting method for musical audio signals , 2003, 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (IEEE Cat. No.03TH8684).

[27]  Judith C. Brown Calculation of a constant Q spectral transform , 1991 .

[28]  Antony Galton,et al.  Temporal logics and their applications , 1987 .

[29]  Joachim M. Buhmann,et al.  Histogram clustering for unsupervised image segmentation , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[30]  Masataka Goto,et al.  A chorus-section detecting method for musical audio signals , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[31]  Joachim M. Buhmann,et al.  Pairwise Data Clustering by Deterministic Annealing , 1997, IEEE Trans. Pattern Anal. Mach. Intell..