Probabilistic Segmentation of Folk Music Recordings

The paper presents a novel method for automatic segmentation of folk music field recordings. The method is based on a distance measure that uses dynamic time warping to cope with tempo variations and a dynamic programming approach to handle pitch drifting for finding similarities and estimating the length of repeating segment. A probabilistic framework based on HMM is used to find segment boundaries, searching for optimal match between the expected segment length, between-segment similarities, and likely locations of segment beginnings. Evaluation of several current state-of-the-art approaches for segmentation of commercial music is presented and their weaknesses when dealing with folk music are exposed, such as intolerance to pitch drift and variable tempo. The proposed method is evaluated and its performance analyzed on a collection of 206 folk songs of different ensemble types: solo, two- and three-voiced, choir, instrumental, and instrumental with singing. It outperforms current commercial music segmentation methods for noninstrumental music and is on a par with the best for instrumental recordings. The method is also comparable to a more specialized method for segmentation of solo singing folk music recordings.

[1]  Kristoffer Jensen,et al.  Multiple Scale Music Segmentation Using Rhythm, Timbre, and Harmony , 2007, EURASIP J. Adv. Signal Process..

[2]  Peter Grosche,et al.  Unsupervised Music Structure Annotation by Time Series Structure Features and Segment Similarity , 2014, IEEE Transactions on Multimedia.

[3]  Peter Grosche,et al.  A Robust Fitness Measure for Capturing Repetitions in Music Recordings With Applications to Audio Thumbnailing , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Constantine Kotropoulos,et al.  ℓ1-GRAPH BASED MUSIC STRUCTURE ANALYSIS , 2011, ISMIR 2011.

[5]  Thomas Sikora,et al.  Music Structure Discovery in Popular Music using Non-negative Matrix Factorization , 2010, ISMIR.

[6]  Frans Wiering,et al.  Robust Segmentation and Annotation of Folk Song Recordings , 2009, ISMIR.

[7]  Constantine Kotropoulos,et al.  l1-Graph Based Music Structure Analysis , 2011, ISMIR.

[8]  Peter Grosche,et al.  Automated Segmentation of Folk Song Field Recordings , 2012, ITG Conference on Speech Communication.

[9]  Masataka Goto,et al.  A chorus section detection method for musical audio signals and its application to a music listening station , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  Daniel P. W. Ellis,et al.  Learning to segment songs with ordinal linear discriminant analysis , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11]  Ron J. Weiss,et al.  Unsupervised Discovery of Temporal Structure in Music , 2011, IEEE Journal of Selected Topics in Signal Processing.

[12]  J. Stephen Downie,et al.  The music information retrieval evaluation exchange (2005-2007): A window into music information retrieval research , 2008, Acoustical Science and Technology.

[13]  Xavier Serra,et al.  Essentia: An Audio Analysis Library for Music Information Retrieval , 2013, ISMIR.

[14]  A. Eronen,et al.  CHORUS DETECTION WITH COMBINED USE OF MFCC AND CHROMA FEATURES AND IMAGE PROCESSING FILTERS , 2007 .

[15]  Oriol Nieto,et al.  Convex non-negative matrix factorization for automatic music structure identification , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[16]  Ciril Bohak,et al.  Finding Repeating Stanzas in Folk Songs , 2012, ISMIR.

[17]  Victor Bisot,et al.  Improving Music Structure Segmentation using lag-priors , 2014, ISMIR.

[18]  Xavier Serra,et al.  Chroma Binary Similarity and Local Alignment Applied to Cover Song Identification , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[19]  Jonathan Foote,et al.  Visualizing music and audio using self-similarity , 1999, MULTIMEDIA '99.

[20]  G. H. Wakefield,et al.  To catch a chorus: using chroma-based representations for audio thumbnailing , 2001, Proceedings of the 2001 IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics (Cat. No.01TH8575).

[21]  Simon Dixon,et al.  10 th International Society for Music Information Retrieval Conference ( ISMIR 2009 ) USING MUSICAL STRUCTURE TO ENHANCE AUTOMATIC CHORD TRANSCRIPTION , 2009 .

[22]  Peter Grosche,et al.  Unsupervised Detection of Music Boundaries by Time Series Structure Features , 2012, AAAI.

[23]  Matija Marolt Probabilistic Segmentation and Labeling of Ethnomusicological Field Recordings , 2009, ISMIR.

[24]  Ron J. Weiss,et al.  Identifying Repeated Patterns in Music Using Sparse Convolutive Non-negative Matrix Factorization , 2010, ISMIR.

[25]  Daniel P. W. Ellis,et al.  Analyzing Song Structure with Spectral Clustering , 2014, ISMIR.

[26]  Meinard Müller,et al.  Audio-based Music Structure Analysis , 2010 .

[27]  Frans Wiering,et al.  Automated analysis of performance variations in folk song recordings , 2010, MIR '10.

[28]  Jonathan Foote,et al.  Automatic audio segmentation using a measure of audio novelty , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[29]  Gregor Strle,et al.  The EthnoMuse digital library: conceptual representation and annotation of ethnomusicological materials , 2012, International Journal on Digital Libraries.

[30]  J. Bello,et al.  SEGMENT SIMILARITY USING 2 D-FOURIER MAGNITUDE COEFFICIENTS , 2014 .

[31]  Peter Grosche,et al.  A Segment-Based Fitness Measure for Capturing Repetitive Structures of Music Recordings , 2011, ISMIR.