Evaluating Hierarchical Structure in Music Annotations

Music exhibits structure at multiple scales, ranging from motifs to large-scale functional components. When inferring the structure of a piece, different listeners may attend to different temporal scales, which can result in disagreements when they describe the same piece. In the field of music informatics research (MIR), it is common to use corpora annotated with structural boundaries at different levels. By quantifying disagreements between multiple annotators, previous research has yielded several insights relevant to the study of music cognition. First, annotators tend to agree when structural boundaries are ambiguous. Second, this ambiguity seems to depend on musical features, time scale, and genre. Furthermore, it is possible to tune current annotation evaluation metrics to better align with these perceptual differences. However, previous work has not directly analyzed the effects of hierarchical structure because the existing methods for comparing structural annotations are designed for “flat” descriptions, and do not readily generalize to hierarchical annotations. In this paper, we extend and generalize previous work on the evaluation of hierarchical descriptions of musical structure. We derive an evaluation metric which can compare hierarchical annotations holistically across multiple levels. sing this metric, we investigate inter-annotator agreement on the multilevel annotations of two different music corpora, investigate the influence of acoustic properties on hierarchical annotations, and evaluate existing hierarchical segmentation algorithms against the distribution of inter-annotator agreement.

[1]  Leon Cohen,et al.  The scale representation , 1993, IEEE Trans. Signal Process..

[2]  Oriol Nieto,et al.  Hierarchical Evaluation of Segment Boundary Detection , 2015, ISMIR.

[3]  François Pachet,et al.  Enforcing Structure on Temporal Sequences: The Allen Constraint , 2016, CP.

[4]  C. Drake,et al.  Synchronizing with Music: Intercultural Differences , 2003, Annals of the New York Academy of Sciences.

[5]  N. Todd A Model of Expressive Timing in Tonal Music , 1985 .

[6]  鐘期 坂本,et al.  Tonal Pitch Space を用いた楽曲の和声解析 , 2009 .

[7]  Yannis Stylianou,et al.  Scale Transform in Rhythmic Similarity of Music , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Linda Barwick Creative (ir)regularities: The intermeshing of text and melody in performance of central Australian song , 1989 .

[9]  C. Drake Psychological Processes Involved in the Temporal Organization of Complex Auditory Sequences: Universal and Acquired Processes , 1998 .

[10]  Angela D. Friederici,et al.  The perception of musical phrase structure: A cross-cultural ERP study , 2006, Brain Research.

[11]  Colin Raffel,et al.  librosa: Audio and Music Signal Analysis in Python , 2015, SciPy.

[12]  C. Palmer Music performance. , 1997, Annual review of psychology.

[13]  Jordan B. L. Smith,et al.  Design and creation of a large-scale database of structural annotations , 2011, ISMIR.

[14]  J. Bharucha,et al.  Varieties of musical experience , 2006, Cognition.

[15]  D. Deutsch,et al.  The Internal Representation of Pitch Sequences in Tonal Music , 1981 .

[16]  Martin Clayton Le mètre et le tâl dans la musique de l'Inde du Nord (Metre and tal in North Indian music) , 1997 .

[17]  Daniel P. W. Ellis,et al.  Analyzing Song Structure with Spectral Clustering , 2014, ISMIR.

[18]  Oriol Nieto,et al.  Perceptual Analysis of the F-Measure to Evaluate Section Boundaries in Music , 2014, ISMIR.

[19]  Matthew E. P. Davies,et al.  AutoMashUpper: Automatic Creation of Multi-Song Music Mashups , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[20]  Meinard Müller,et al.  Retrieving audio recordings using musical themes , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[21]  Daniel P. W. Ellis,et al.  MIR_EVAL: A Transparent Implementation of Common MIR Metrics , 2014, ISMIR.

[22]  Oriol Nieto,et al.  Convex non-negative matrix factorization for automatic music structure identification , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[23]  Davide Rocchesso,et al.  A Fast Mellin and Scale Transform , 2007, EURASIP J. Adv. Signal Process..

[24]  Oriol Nieto,et al.  Perceptual analysis of the f-measure for evaluating section boundaries in music: Proceedings of the 15th International Society for Music Information Retrieval Conference (ISMIR 2014) , 2014 .

[25]  Peter Grosche,et al.  Unsupervised Detection of Music Boundaries by Time Series Structure Features , 2012, AAAI.

[26]  Jordan B. L. Smith,et al.  Audio Properties of Perceived Boundaries in Music , 2014, IEEE Transactions on Multimedia.

[27]  Hanna M. Lukashevich Towards Quantitative Measures of Evaluating Song Segmentation , 2008, ISMIR.

[28]  Ray Jackendoff,et al.  An overview of hierarchical structure in music , 1983 .

[29]  Daniel P. W. Ellis,et al.  Learning to segment songs with ordinal linear discriminant analysis , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[30]  Oriol Nieto,et al.  Systematic Exploration of Computational Music Structure Research , 2016, ISMIR.

[31]  D. Deutsch 6 – Grouping Mechanisms in Music , 2013 .

[32]  Meinard Müller,et al.  Analyzing Measure Annotations for Western Classical Music Recordings , 2016, ISMIR.

[33]  Thomas Sikora,et al.  Music Structure Discovery in Popular Music using Non-negative Matrix Factorization , 2010, ISMIR.

[34]  M. Bruderer Perception and modeling of segment boundaries in popular music , 2008 .

[35]  Carol L. Krumhansl,et al.  Infants’ Perception of Phrase Structure in Music , 1990 .

[36]  Thomas Grill,et al.  Music Boundary Detection Using Neural Networks on Combined Features and Two-Level Annotations , 2015, ISMIR.

[37]  S. Trehub,et al.  Infant music perception: Domain-general or domain-specific mechanisms? , 2006, Cognition.

[38]  Oriol Nieto,et al.  Music segment similarity using 2D-Fourier Magnitude Coefficients , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[39]  Anssi Klapuri,et al.  State of the Art Report: Audio-Based Music Structure Analysis , 2010, ISMIR.

[40]  Elaine Chew,et al.  Music generation with structural constraints : an operations research approach ∗ , 2016 .

[41]  Mark B. Sandler,et al.  Structural Segmentation of Musical Audio by Constrained Clustering , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[42]  S. McAdams Psychological constraints on form-bearing dimensions in music , 1989 .

[43]  Oriol Nieto Discovering Structure in Music: Automatic Approaches and Perceptual Evaluations , 2015 .

[44]  David J. Heeger,et al.  The neural processing of hierarchical structure in music and speech at different timescales , 2015, Front. Neurosci..