Using quadratic programming to estimate feature relevance in structural analyses of music

To identify repeated patterns and contrasting sections in music, it is common to use self-similarity matrices (SSMs) to visualize and estimate structure. We introduce a novel application for SSMs derived from audio recordings: using them to learn about the potential reasoning behind a listener's annotation. We use SSMs generated by musically-motivated audio features at various timescales to represent contributions to a structural annotation. Since a listener's attention can shift among musical features (e.g., rhythm, timbre, and harmony) throughout a piece, we further break down the SSMs into section-wise components and use quadratic programming (QP) to minimize the distance between a linear sum of these components and the annotated description. We posit that the optimal section-wise weights on the feature components may indicate the features to which a listener attended when annotating a piece, and thus may help us to understand why two listeners disagreed about a piece's structure. We discuss some examples that substantiate the claim that feature relevance varies throughout a piece, using our method to investigate differences between listeners' interpretations, and lastly propose some variations on our method.

[1]  Meinard Müller,et al.  Audio-based Music Structure Analysis , 2010 .

[2]  Matija Marolt,et al.  A Mid-level Melody-based Representation for Calculating Audio Similarity , 2006, ISMIR.

[3]  Geoffroy Peeters Deriving Musical Structures from Signal Analysis for Music Audio Summary Generation: "Sequence" and "State" Approach , 2003, CMMR.

[4]  Ag Armin Kohlrausch,et al.  The perception of structural boundaries in melody lines of Western popular music , 2009 .

[5]  Thomas Sikora,et al.  Music Structure Discovery in Popular Music using Non-negative Matrix Factorization , 2010, ISMIR.

[6]  A. Eronen,et al.  CHORUS DETECTION WITH COMBINED USE OF MFCC AND CHROMA FEATURES AND IMAGE PROCESSING FILTERS , 2007 .

[7]  Geoffroy Peeters Sequence Representation of Music Structure Using Higher-Order Similarity Matrix and Maximum-Likelihood Approach , 2007, ISMIR.

[8]  Mark B. Sandler,et al.  Structural Segmentation of Multitrack Audio , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  D. Ruelle,et al.  Recurrence Plots of Dynamical Systems , 1987 .

[10]  Elias Pampalk A Matlab Toolbox to Compute Music Similarity from Audio , 2004, ISMIR.

[11]  Jonathan Foote,et al.  Media segmentation using self-similarity decomposition , 2003, IS&T/SPIE Electronic Imaging.

[12]  Meinard Müller,et al.  Path-constrained partial music synchronization , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[13]  Jordan B. L. Smith,et al.  Design and creation of a large-scale database of structural annotations , 2011, ISMIR.

[14]  C.-C. Jay Kuo,et al.  Similarity matrix processing for music structure analysis , 2006, AMCMM '06.

[15]  Jonathan Foote,et al.  Visualizing music and audio using self-similarity , 1999, MULTIMEDIA '99.

[16]  Masataka Goto,et al.  A chorus section detection method for musical audio signals and its application to a music listening station , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[17]  Carol L. Krumhansl,et al.  Perceiving Musical Time , 1990 .

[18]  B. Ong Structural analysis and segmentation of music signals , 2007 .

[19]  G. H. Wakefield,et al.  To catch a chorus: using chroma-based representations for audio thumbnailing , 2001, Proceedings of the 2001 IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics (Cat. No.01TH8575).

[20]  Simon Dixon,et al.  10 th International Society for Music Information Retrieval Conference ( ISMIR 2009 ) USING MUSICAL STRUCTURE TO ENHANCE AUTOMATIC CHORD TRANSCRIPTION , 2009 .

[21]  Geoffroy Peeters,et al.  Adaptive Temporal Modeling of Audio Features in the Context of Music Structure Segmentation , 2012, Adaptive Multimedia Retrieval.

[22]  Peter Grosche,et al.  Structure-Based Audio Fingerprinting for Music Retrieval , 2012, ISMIR.

[23]  Annabel J. Cohen,et al.  Parsing of Melody: Quantification and Testing of the Local Grouping Rules of Lerdahl and Jackendoff's A Generative Theory of Tonal Music , 2004 .

[24]  T. Jehan,et al.  Hierarchical multi-class self similarities , 2005, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005..

[25]  Ichiro Fujinaga,et al.  Exploiting music structures for digital libraries , 2011, JCDL '11.

[26]  Jonathan Foote,et al.  Automatic audio segmentation using a measure of audio novelty , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[27]  Gerhard Widmer,et al.  Exploring Music Collections by Browsing Different Views , 2004, Computer Music Journal.

[28]  A. Klapuri,et al.  Music structure analysis by finding repeated parts , 2006, AMCMM '06.

[29]  Elias Pampalk,et al.  Content-based organization and visualization of music archives , 2002, MULTIMEDIA '02.

[30]  Irfan A. Essa,et al.  Feature Weighting for Segmentation , 2004, ISMIR.