Robust Segmentation and Annotation of Folk Song Recordings

Even though folk songs have been passed down mainly by oral tradition, most musicologists study the relation between folk songs on the basis of score-based transcriptions. Due to the complexity of audio recordings, once having the transcriptions, the original recorded tunes are often no longer studied in the actual folk song research though they still may contain valuable information. In this paper, we introduce an automated approach for segmenting folk song recordings into its constituent stanzas, which can then be made accessible to folk song researchers by means of suitable visualization, searching, and navigation interfaces. Performed by elderly non-professional singers, the main challenge with the recordings is that most singers have serious problems with the intonation, fluctuating with their voices even over several semitones throughout a song. Using a combination of robust audio features along with various cleaning and audio matching strategies, our approach yields accurate segmentations even in the presence of strong deviations.

[1]  Frans Wiering,et al.  Modelling Folksong Melodies , 2009 .

[2]  Meinard Müller,et al.  Transposition-Invariant Self-Similarity Matrices , 2007, ISMIR.

[3]  Meinard Müller,et al.  Multimodal presentation and browsing of music , 2008, ICMI '08.

[4]  Gregory H. Wakefield,et al.  Audio thumbnailing of popular music using chroma-based representations , 2005, IEEE Transactions on Multimedia.

[5]  Remco C. Veltkamp,et al.  Towards Integration of Music Information Retrieval and Folk Song Research , 2007 .

[6]  George Tzanetakis,et al.  Polyphonic audio matching and alignment for music retrieval , 2003, 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (IEEE Cat. No.03TH8684).

[7]  Emilia Gómez Gutiérrez,et al.  Tonal description of music audio signals , 2006 .

[8]  Hideki Kawahara,et al.  YIN, a fundamental frequency estimator for speech and music. , 2002, The Journal of the Acoustical Society of America.

[9]  Xavier Serra,et al.  Chroma Binary Similarity and Local Alignment Applied to Cover Song Identification , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  Meinard Müller,et al.  Information retrieval for music and motion , 2007 .

[11]  Zoltán Juhász,et al.  A systematic comparison of different European folk music traditions using self-organizing maps , 2006 .

[12]  Masataka Goto,et al.  A chorus-section detecting method for musical audio signals , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[13]  Gerhard Widmer,et al.  In Search of the Horowitz Factor , 2003, AI Mag..