Separation of Unvoiced Fricatives in Singing Voice Mixtures with Semi-Supervised NMF

Separating the singing voice from a musical mixture is a problem widely addressed due to its various applications. However, most approaches do not tackle the separation of unvoiced consonant sounds, which causes a loss of quality in any vocal source separation algorithm, and is especially noticeable for unvoiced fricatives (e.g. /T/ in thing) due to their energy level and duration. Fricatives are consonants produced by forcing air through a narrow channel made by placing two articulators close together. We propose a method to model and separate unvoiced fricative consonants based on a semisupervised Non-negative Matrix Factorization, in which a set of spectral basis components are learnt from a training excerpt. We implemented this method as an extension of an existing well-known factorization approach for singing voice (SIMM). An objective evaluation shows a small improvement in the separation results. Informal listening tests show a significant increase of intelligibility in the isolated vocals.

[1]  Anssi Klapuri,et al.  Musical Instrument Recognition in Polyphonic Audio Using Source-Filter Model for Sound Separation , 2009, ISMIR.

[2]  Roland Badeau,et al.  Time-dependent parametric and harmonic templates in non-negative matrix factorization , 2010 .

[3]  Gaël Richard,et al.  An iterative approach to monaural musical mixture de-soloing , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[4]  Meinard Müller,et al.  Score-Informed Voice Separation For Piano Recordings , 2011, ISMIR.

[5]  Emmanuel Vincent,et al.  Subjective and Objective Quality Assessment of Audio Source Separation , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  Gaël Richard,et al.  A Musically Motivated Mid-Level Representation for Pitch Estimation and Musical Audio Source Separation , 2011, IEEE Journal of Selected Topics in Signal Processing.

[7]  Jordi Bonada,et al.  Low-Latency Instrument Separation in Polyphonic Audio Using Timbre Models , 2012, LVA/ICA.

[8]  Jyh-Shing Roger Jang,et al.  On the Improvement of Singing Voice Separation for Monaural Recordings Using the MIR-1K Dataset , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  Kin Hong Wong,et al.  Automatic lyrics alignment for Cantonese popular music , 2006, Multimedia Systems.

[10]  Jordi Janer,et al.  Combining a harmonic-based NMF decomposition with transient analysis for instantaneous percussion separation , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11]  Mikkel N. Schmidt,et al.  Single-channel speech separation using sparse non-negative matrix factorization , 2006, INTERSPEECH.

[12]  Bhiksha Raj,et al.  Phoneme-Dependent NMF for Speech Enhancement in Monaural Mixtures , 2011, INTERSPEECH.