Multi-Pitch Detection and Voice Assignment for A Cappella Recordings of Multiple Singers

This paper presents a multi-pitch detection and voice assignment method applied to audio recordings containing a cappella performances with multiple singers. A novel approach combining an acoustic model for multi-pitch detection and a music language model for voice separation and assignment is proposed. The acoustic model is a spectrogram factorization process based on Probabilistic Latent Component Analysis (PLCA), driven by a 6-dimensional dictionary with pre-learned spectral templates. The voice separation component is based on hidden Markov models that use musicological assumptions. By integrating the models, the system can detect multiple concurrent pitches in vocal music and assign each detected pitch to a specific voice corresponding to a voice type such as soprano, alto, tenor or bass (SATB). This work focuses on four-part compositions, and evaluations on recordings of Bach Chorales and Barbershop quartets show that our integrated approach achieves an F-measure of over 70% for frame-based multipitch detection and over 45% for four-voice assignment.

[1]  Bryan Pardo,et al.  Multi-pitch Streaming of Harmonic Sound Mixtures , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[2]  Mert Bay,et al.  Evaluation of Multiple-F0 Estimation and Tracking Systems , 2009, ISMIR.

[3]  David Temperley,et al.  A Probabilistic Model of Melody Perception , 2008, ISMIR.

[4]  Mert Bay,et al.  Second Fiddle is Important Too: Pitch Tracking Individual Voices in Polyphonic Music , 2012, ISMIR.

[5]  M.P. Ryynanen,et al.  Polyphonic music transcription using note event modeling , 2005, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005..

[6]  C. Chuan Tone and Voice: A Derivation of the Rules of Voice-Leading from Perceptual Principles , 2001 .

[7]  E. Cambouropoulos Voice And Stream: Perceptual And Computational Modeling Of Voice Separation , 2008 .

[8]  Andrew J. Viterbi,et al.  Error bounds for convolutional codes and an asymptotically optimum decoding algorithm , 1967, IEEE Trans. Inf. Theory.

[9]  Paris Smaragdis,et al.  Relative pitch estimation of multiple instruments , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[10]  Daniel P. W. Ellis,et al.  Transcribing Multi-Instrument Polyphonic Music With Hierarchical Eigeninstruments , 2011, IEEE Journal of Selected Topics in Signal Processing.

[11]  José Manuel Iñesta Quereda,et al.  Efficient methods for joint estimation of multiple fundamental frequencies in music signals , 2012, EURASIP Journal on Advances in Signal Processing.

[12]  Dmitri Tymoczko Scale Theory, Serial Theory and Voice Leading , 2008 .

[13]  Simon Dixon,et al.  PYIN: A fundamental frequency estimator using probabilistic threshold distributions , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[14]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[15]  Anssi Klapuri,et al.  A Matlab Toolbox for Efficient Perfect Reconstruction Time-Frequency Transforms with Log-Frequency Resolution , 2014, Semantic Audio.

[16]  Paul E. Utgoff,et al.  VOISE: Learning to Segregate Voices in Explicit and Implicit Polyphony , 2005, ISMIR.

[17]  Anssi Klapuri,et al.  Automatic music transcription: challenges and future directions , 2013, Journal of Intelligent Information Systems.

[18]  Emmanouil Benetos,et al.  Automatic Transcription of a Cappella Recordings from Multiple Singers , 2017 .

[19]  Andrew McLeod,et al.  HMM-Based Voice Separation of MIDI Performance , 2016 .

[20]  Masataka Goto,et al.  RWC Music Database: Music genre database and musical instrument sound database , 2003, ISMIR.

[21]  Anssi Klapuri,et al.  Improving instrument recognition in polyphonic music through system integration , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[22]  Gerhard Widmer,et al.  On the Potential of Simple Framewise Approaches to Piano Transcription , 2016, ISMIR.

[23]  Ciril Bohak,et al.  Transcription of Polyphonic Vocal Music with a Repetitive Melodic Structure , 2016 .

[24]  Emmanuel Vincent,et al.  Adaptive Harmonic Spectral Decomposition for Multiple Pitch Estimation , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[25]  Simon Dixon,et al.  An End-to-End Neural Network for Polyphonic Piano Music Transcription , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.