Automated Segmentation of Folk Song Field Recordings

In this paper, we introduce an automated procedure for segmenting a given folk song field recording into its constituent stanzas. One challenge arises from the fact that these recordings are performed by elderly non-professional singers under poor recording conditions such that the constituent stanzas may reveal significant temporal and spectral deviations. Unlike a previously described segmentation approach that relies on a manually transcribed reference stanza, we introduce a reference-free segmentation procedure, which is driven by an audio thumbnailing procedure in combination with enhanced similarity matrices. Our experiments on a Dutch folk song collection show that our segmentation results are comparable to the ones obtained by the reference-based method.

[1]  Meinard Müller,et al.  Chroma Toolbox: Matlab Implementations for Extracting Variants of Chroma-Based Audio Features , 2011, ISMIR.

[2]  Emilia Gómez Gutiérrez,et al.  Tonal description of music audio signals , 2006 .

[3]  Mark B. Sandler,et al.  Extraction of High-Level Musical Structure From Audio Data and Its Application to Thumbnail Generation , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[4]  Hideki Kawahara,et al.  YIN, a fundamental frequency estimator for speech and music. , 2002, The Journal of the Acoustical Society of America.

[5]  Xavier Serra,et al.  Chroma Binary Similarity and Local Alignment Applied to Cover Song Identification , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  Masataka Goto,et al.  A chorus section detection method for musical audio signals and its application to a music listening station , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  Hanna M. Lukashevich Towards Quantitative Measures of Evaluating Song Segmentation , 2008, ISMIR.

[8]  Jonathan Foote,et al.  Visualizing music and audio using self-similarity , 1999, MULTIMEDIA '99.

[9]  Peter Grosche,et al.  A Segment-Based Fitness Measure for Capturing Repetitive Structures of Music Recordings , 2011, ISMIR.

[10]  Meinard Müller,et al.  Enhancing Similarity Matrices for Music Audio Analysis , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[11]  Geoffroy Peeters Sequence Representation of Music Structure Using Higher-Order Similarity Matrix and Maximum-Likelihood Approach , 2007, ISMIR.

[12]  Frans Wiering,et al.  Robust Segmentation and Annotation of Folk Song Recordings , 2009, ISMIR.

[13]  Barry Vercoe,et al.  Music thumbnailing via structural analysis , 2003, ACM Multimedia.

[14]  Meinard Müller,et al.  Transposition-Invariant Self-Similarity Matrices , 2007, ISMIR.

[15]  Gregory H. Wakefield,et al.  Audio thumbnailing of popular music using chroma-based representations , 2005, IEEE Transactions on Multimedia.

[16]  Meinard Müller,et al.  Information retrieval for music and motion , 2007 .

[17]  Remco C. Veltkamp,et al.  Towards Integration of MIR and Folk Song Research , 2007, ISMIR.

[18]  Meinard Müller,et al.  Audio-based Music Structure Analysis , 2010 .

[19]  Zoltan Juhasz,et al.  Motive Identification in 22 Folksong Corpora Using Dynamic Time Warping and Self Organizing Maps , 2009, ISMIR.