论文信息 - Joint Sound Source Separation and Speaker Recognition

Joint Sound Source Separation and Speaker Recognition

Non-negative Matrix Factorization (NMF) has already been applied to learn speaker characterizations from single or non-simultaneous speech for speaker recognition applications. It is also known for its good performance in (blind) source separation for simultaneous speech. This paper explains how NMF can be used to jointly solve the two problems in a multichannel speaker recognizer for simultaneous speech. It is shown how state-of-the-art multichannel NMF for blind source separation can be easily extended to incorporate speaker recognition. Experiments on the CHiME corpus show that this method outperforms the sequential approach of first applying source separation, followed by speaker recognition that uses state-of-the-art i-vector techniques.

Hugo Van hamme | Jeroen Zegers

[1] Tuomas Virtanen,et al. Monaural Sound Source Separation by Nonnegative Matrix Factorization With Temporal Continuity and Sparseness Criteria , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[2] H. Sebastian Seung,et al. Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[3] Rahim Saeidi,et al. Group Sparsity for Speaker Identity Discrimination in Factorisation-based Speech Recognition , 2012, INTERSPEECH.

[4] D. R. Campbell,et al. A MATLAB Simulation of “ Shoebox ” Room Acoustics for use in Research and Teaching , 2022 .

[5] Björn W. Schuller,et al. Exploring Nonnegative Matrix Factorization for Audio Classification: Application to Speaker Recognition , 2012, ITG Conference on Speech Communication.

[6] Andrzej Cichocki,et al. Nonnegative Matrix and Tensor Factorization T , 2007 .

[7] Patrick Kenny,et al. Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[8] Rémi Gribonval,et al. Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[9] Ning Ma,et al. The CHiME corpus: a resource and a challenge for computational hearing in multisource environments , 2010, INTERSPEECH.

[10] H. Kameoka,et al. Convergence-guaranteed multiplicative algorithms for nonnegative matrix factorization with β-divergence , 2010, 2010 IEEE International Workshop on Machine Learning for Signal Processing.

[11] Steven van de Par,et al. A Binaural Scene Analyzer for Joint Localization and Recognition of Speakers in the Presence of Interfering Noise Sources and Reverberation , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[12] Francis Bach,et al. Itakura-Saito nonnegative matrix factorization with group sparsity , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[13] Hugo Van hamme,et al. Speaker age estimation and gender detection based on supervised Non-Negative Matrix Factorization , 2011, 2011 IEEE Workshop on Biometric Measurements and Systems for Security and Medical Applications (BIOMS).

[14] Yuan Gao,et al. Improving molecular cancer class discovery through sparse non-negative matrix factorization , 2005 .

[15] DeLiang Wang,et al. Robust Speaker Identification in Noisy and Reverberant Conditions , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[16] Mike E. Davies,et al. Latent Variable Analysis and Signal Separation , 2010 .

[17] Larry P. Heck,et al. MSR Identity Toolbox v1.0: A MATLAB Toolbox for Speaker Recognition Research , 2013 .

[18] Hirokazu Kameoka,et al. Multichannel Extensions of Non-Negative Matrix Factorization With Complex-Valued Data , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[19] Alexey Ozerov,et al. Multichannel Nonnegative Matrix Factorization in Convolutive Mixtures for Audio Source Separation , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[20] Szymon Drgas,et al. Speaker Verification Using Adaptive Dictionaries in Non-negative Spectrogram Deconvolution , 2015, LVA/ICA.

[21] Hugo Van hamme,et al. Blind audio source separation of stereo mixtures using Bayesian Non-negative Matrix Factorization , 2014, 2014 22nd European Signal Processing Conference (EUSIPCO).

[22] G. Carter,et al. The generalized correlation method for estimation of time delay , 1976 .