A Framework for Multi-f0 Modeling in SATB Choir Recordings

Fundamental frequency (f0) modeling is an important but relatively unexplored aspect of choir singing. Performance evaluation as well as auditory analysis of singing, whether individually or in a choir, often depend on extracting f0 contours for the singing voice. However, due to the large number of singers, singing at a similar frequency range, extracting the exact individual pitch contours from choir recordings is a challenging task. In this paper, we address this task and develop a methodology for modeling pitch contours of SATB choir recordings. A typical SATB choir consists of four parts, each covering a distinct range of pitches and often with multiple singers each. We first evaluate some state-of-the-art multi-f0 estimation systems for the particular case of choirs with a single singer per part, and observe that the pitch of individual singers can be estimated to a relatively high degree of accuracy. We observe, however, that the scenario of multiple singers for each choir part (i.e. unison singing) is far more challenging. In this work we propose a methodology based on combining a multi-f0 estimation methodology based on deep learning followed by a set of traditional DSP techniques to model f0 and its dispersion instead of a single f0 trajectory for each choir part. We present and discuss our observations and test our framework with different singer configurations.

[1]  Agustín Martorell Domínguez,et al.  Analysis of intonation in unison choir singing , 2018 .

[2]  Yi-Hsuan Yang,et al.  Exploiting Frequency, Periodicity and Harmonicity Using Advanced Time-Frequency Concentration Techniques for Multipitch Estimation of Choir and Symphony , 2016, ISMIR.

[3]  Simon Dixon,et al.  Analysis of Interactive Intonation in Unaccompanied SATB Ensembles , 2017, ISMIR.

[4]  S. Ternström,et al.  Intonation analysis of a multi-channel choir recording , 2004 .

[5]  Changshui Zhang,et al.  Multiple Fundamental Frequency Estimation by Modeling Spectral Peaks and Non-Peak Regions , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  Justin Salamon,et al.  Deep Salience Representations for F0 Estimation in Polyphonic Music , 2017, ISMIR.

[7]  J. Sundberg,et al.  The Science of Singing Voice , 1987 .

[8]  S Ternström Perceptual evaluations of voice scatter in unison choir sounds. , 1993, Journal of voice : official journal of the Voice Foundation.

[9]  Mark Steedman,et al.  Automatic Transcription of Polyphonic Vocal Music , 2017, Handbook of Artificial Intelligence for Music.

[10]  Masataka Goto,et al.  RWC Music Database: Music genre database and musical instrument sound database , 2003, ISMIR.

[11]  Anssi Klapuri,et al.  Multiple Fundamental Frequency Estimation by Summing Harmonic Amplitudes , 2006, ISMIR.

[12]  Sten Ternström,et al.  Choir acoustics : an overview of scientific research published to date , 2003 .

[13]  Anssi Klapuri,et al.  Automatic music transcription: challenges and future directions , 2013, Journal of Intelligent Information Systems.

[14]  Emmanouil Benetos,et al.  Automatic Transcription of a Cappella Recordings from Multiple Singers , 2017 .

[15]  Emilia Gómez,et al.  Towards Computer-Assisted Flamenco Transcription: An Experimental Comparison of Automatic Transcription Algorithms as Applied to A Cappella Singing , 2013, Computer Music Journal.