Towards Quantitative Measures of Evaluating Song Segmentation

Automatic music structure analysis or song segmentation has immediate applications in the field of music information retrieval. Among these applications is active music navigation, automatic generation of audio summaries, automatic music analysis, etc. One of the important aspects of a song segmentation task is its evaluation. Commonly, that implies comparing the automatically estimated segmentation with a ground-truth, annotated by human experts. The automatic evaluation of segmentation algorithms provides the quantitative measure that reflects how well the estimated segmentation matches the annotated ground-truth. In this paper we present a novel evaluation measure based on informationtheoretic conditional entropy. The principal advantage of the proposed approach lies in the applied normalization, which enables the comparison of the automatic evaluation results, obtained for songs with a different amount of states. We discuss and compare the evaluation scores commonly used for evaluating song segmentation at present. We provide several examples illustrating the behavior of different evaluation measures and weigh the benefits of the presented metric against the others.

[1]  Mark B. Sandler,et al.  Theory and Evaluation of a Bayesian Music Structure Extractor , 2005, ISMIR.

[2]  Geoffroy Peeters Sequence Representation of Music Structure Using Higher-Order Similarity Matrix and Maximum-Likelihood Approach , 2007, ISMIR.

[3]  Hervé Bourlard,et al.  Unknown-multiple speaker clustering using HMM , 2002, INTERSPEECH.

[4]  Mark B. Sandler,et al.  Structural Segmentation of Musical Audio by Constrained Clustering , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  Masataka Goto,et al.  A chorus section detection method for musical audio signals and its application to a music listening station , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  A. Klapuri,et al.  Music structure analysis by finding repeated parts , 2006, AMCMM '06.

[7]  Mark B. Sandler,et al.  Using duration models to reduce fragmentation in audio segmentation , 2006, Machine Learning.

[8]  Geoffroy Peeters Deriving Musical Structures from Signal Analysis for Music Audio Summary Generation: "Sequence" and "State" Approach , 2003, CMMR.

[9]  Qian Huang,et al.  Quantitative methods of evaluating image segmentation , 1995, Proceedings., International Conference on Image Processing.

[10]  Herbert Gish,et al.  Clustering speakers by their voices , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[11]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.