论文信息 - Separation of Unvoiced Fricatives in Singing Voice Mixtures with Semi-Supervised NMF

Separation of Unvoiced Fricatives in Singing Voice Mixtures with Semi-Supervised NMF

Separating the singing voice from a musical mixture is a problem widely addressed due to its various applications. However, most approaches do not tackle the separation of unvoiced consonant sounds, which causes a loss of quality in any vocal source separation algorithm, and is especially noticeable for unvoiced fricatives (e.g. /T/ in thing) due to their energy level and duration. Fricatives are consonants produced by forcing air through a narrow channel made by placing two articulators close together. We propose a method to model and separate unvoiced fricative consonants based on a semisupervised Non-negative Matrix Factorization, in which a set of spectral basis components are learnt from a training excerpt. We implemented this method as an extension of an existing well-known factorization approach for singing voice (SIMM). An objective evaluation shows a small improvement in the separation results. Informal listening tests show a significant increase of intelligibility in the isolated vocals.

Jordi Janer | Ricard Marxer | R. Marxer | J. Janer

[1] Anssi Klapuri,et al. Musical Instrument Recognition in Polyphonic Audio Using Source-Filter Model for Sound Separation , 2009, ISMIR.

[2] Roland Badeau,et al. Time-dependent parametric and harmonic templates in non-negative matrix factorization , 2010 .

[3] Gaël Richard,et al. An iterative approach to monaural musical mixture de-soloing , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[4] Meinard Müller,et al. Score-Informed Voice Separation For Piano Recordings , 2011, ISMIR.

[5] Emmanuel Vincent,et al. Subjective and Objective Quality Assessment of Audio Source Separation , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[6] Gaël Richard,et al. A Musically Motivated Mid-Level Representation for Pitch Estimation and Musical Audio Source Separation , 2011, IEEE Journal of Selected Topics in Signal Processing.

[7] Jordi Bonada,et al. Low-Latency Instrument Separation in Polyphonic Audio Using Timbre Models , 2012, LVA/ICA.

[8] Jyh-Shing Roger Jang,et al. On the Improvement of Singing Voice Separation for Monaural Recordings Using the MIR-1K Dataset , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[9] Kin Hong Wong,et al. Automatic lyrics alignment for Cantonese popular music , 2006, Multimedia Systems.

[10] Jordi Janer,et al. Combining a harmonic-based NMF decomposition with transient analysis for instantaneous percussion separation , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11] Mikkel N. Schmidt,et al. Single-channel speech separation using sparse non-negative matrix factorization , 2006, INTERSPEECH.

[12] Bhiksha Raj,et al. Phoneme-Dependent NMF for Speech Enhancement in Monaural Mixtures , 2011, INTERSPEECH.