Music Structure Analysis Using a Probabilistic Fitness Measure and a Greedy Search Algorithm

This paper proposes a method for recovering the sectional form of a musical piece from an acoustic signal. The description of form consists of a segmentation of the piece into musical parts, grouping of the segments representing the same part, and assigning musically meaningful labels, such as ldquochorusrdquo or ldquoverse,rdquo to the groups. The method uses a fitness function for the descriptions to select the one with the highest match with the acoustic properties of the input piece. Different aspects of the input signal are described with three acoustic features: mel-frequency cepstral coefficients, chroma, and rhythmogram. The features are used to estimate the probability that two segments in the description are repeats of each other, and the probabilities are used to determine the total fitness of the description. Creating the candidate descriptions is a combinatorial problem and a novel greedy algorithm constructing descriptions gradually is proposed to solve it. The group labeling utilizes a musicological model consisting of N-grams. The proposed method is evaluated on three data sets of musical pieces with manually annotated ground truth. The evaluations show that the proposed method is able to recover the structural description more accurately than the state-of-the-art reference method.

[1]  Masataka Goto,et al.  A Supervised Approach for Detecting Boundaries in Music Using Difference Features and Boosting , 2007, ISMIR.

[2]  Matthew Cooper,et al.  Summarizing popular music via structural similarity analysis , 2003, 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (IEEE Cat. No.03TH8684).

[3]  Jean Laroche,et al.  A dynamic programming approach to audio segmentation and speech/music discrimination , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Gregory H. Wakefield,et al.  Audio thumbnailing of popular music using chroma-based representations , 2005, IEEE Transactions on Multimedia.

[5]  Michael A. Casey,et al.  General sound classification and similarity in MPEG-7 , 2001, Organised Sound.

[6]  Masataka Goto,et al.  AIST Annotation for the RWC Music Database , 2006, ISMIR.

[7]  Meinard Müller,et al.  Transposition-Invariant Self-Similarity Matrices , 2007, ISMIR.

[8]  Mark B. Sandler,et al.  Using duration models to reduce fragmentation in audio segmentation , 2006, Machine Learning.

[9]  Masataka Goto,et al.  RWC Music Database: Popular, Classical and Jazz Music Databases , 2002, ISMIR.

[10]  A. Eronen,et al.  CHORUS DETECTION WITH COMBINED USE OF MFCC AND CHROMA FEATURES AND IMAGE PROCESSING FILTERS , 2007 .

[11]  Jonathan Foote,et al.  Automatic audio segmentation using a measure of audio novelty , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[12]  Hanna M. Lukashevich Towards Quantitative Measures of Evaluating Song Segmentation , 2008, ISMIR.

[13]  Jonathan Foote,et al.  Visualizing music and audio using self-similarity , 1999, MULTIMEDIA '99.

[14]  Mark B. Sandler,et al.  Extraction of High-Level Musical Structure From Audio Data and Its Application to Thumbnail Generation , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[15]  C.-C. Jay Kuo,et al.  Similarity matrix processing for music structure analysis , 2006, AMCMM '06.

[16]  Anssi Klapuri,et al.  Automatic Transcription of Melody, Bass Line, and Chords in Polyphonic Music , 2008, Computer Music Journal.

[17]  Emilia Gómez,et al.  Automatic Tonal Analysis from Music Summaries for Version Identification , 2006 .

[18]  Namunu Chinthaka Maddage Automatic structure detection for popular music , 2006, IEEE Multimedia.

[19]  Ramin Samadani,et al.  Automatic Generation of Music Thumbnails , 2007, 2007 IEEE International Conference on Multimedia and Expo.

[20]  Mark Sandler,et al.  Segmentation of Musical Signals Using Hidden Markov Models. , 2001 .

[21]  Chin-Hui Lee,et al.  A hidden Markov model based approach to music segmentation and identification , 2003, Fourth International Conference on Information, Communications and Signal Processing, 2003 and the Fourth Pacific Rim Conference on Multimedia. Proceedings of the 2003 Joint.

[22]  Barry Vercoe,et al.  Automated analysis of musical structure , 2005 .

[23]  Ning Hu,et al.  Pattern Discovery Techniques for Music Audio , 2002, ISMIR.

[24]  Geoffroy Peeters Deriving Musical Structures from Signal Analysis for Music Audio Summary Generation: "Sequence" and "State" Approach , 2003, CMMR.

[25]  Guillaume Boutard,et al.  Browsing inside a Music Track, the Experimentation Case Study , 2006 .

[26]  Michael A. Casey,et al.  Algorithms for Determining and Labelling Approximate Hierarchical Self-Similarity , 2007, ISMIR.

[27]  Geoffroy Peeters Sequence Representation of Music Structure Using Higher-Order Similarity Matrix and Maximum-Likelihood Approach , 2007, ISMIR.

[28]  Martin F. McKinney,et al.  Structural boundary perception in popular music , 2006, ISMIR.

[29]  D. Ruelle,et al.  Recurrence Plots of Dynamical Systems , 1987 .

[30]  Meinard Müller,et al.  Towards Structural Analysis of Audio Recordings in the Presence of Musical Variations , 2007, EURASIP J. Adv. Signal Process..

[31]  Andreas Rauber,et al.  Automatic Audio Segmentation: Segment Boundary and Structure Detection in Popular Music , 2008 .

[32]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[33]  Matija Marolt,et al.  A Mid-level Melody-based Representation for Calculating Audio Similarity , 2006, ISMIR.

[34]  Mark B. Sandler,et al.  Structural Segmentation of Musical Audio by Constrained Clustering , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[35]  Mohan S. Kankanhalli,et al.  Automatically summarize musical audio using adaptive clustering , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[36]  Ag Armin Kohlrausch,et al.  The perception of structural boundaries in melody lines of Western popular music , 2009 .

[37]  A. Klapuri,et al.  ACOUSTIC FEATURES FOR MUSIC PIECE STRUCTURE ANALYSIS , 2011 .

[38]  Steve Young,et al.  Token passing: a simple conceptual model for connected speech recognition systems , 1989 .

[39]  Mark B. Sandler,et al.  Theory and Evaluation of a Bayesian Music Structure Extractor , 2005, ISMIR.

[40]  Beth Logan,et al.  Music summarization using key phrases , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[41]  Kristoffer Jensen,et al.  Multiple Scale Music Segmentation Using Rhythm, Timbre, and Harmony , 2007, EURASIP J. Adv. Signal Process..

[42]  Jaakko Astola,et al.  Analysis of the meter of acoustic musical signals , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[43]  Masataka Goto,et al.  A chorus-section detecting method for musical audio signals , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[44]  A. Klapuri,et al.  Music structure analysis by finding repeated parts , 2006, AMCMM '06.

[45]  Lie Lu,et al.  Automated extraction of music snippets , 2003, ACM Multimedia.

[46]  Anssi Klapuri,et al.  Labelling the Structural Parts of a Music Piece with Markov Models , 2009, CMMR.