Multi-layer Kullback-Leibler-based Complex NMF with LPC error clustering for blind source separation

In many applications such as music transcription, audio forensics, and speech source separation, it is needed to decompose a mono recording into its respective sources. These techniques are usually referred to as blind source separation (BSS). One of the methods recently used in BSS is non-negative matrix factorization (NMF) both in supervised and unsupervised learning cases. In this paper, we propose a novel NMF-based algorithm namely, multi-layer KL-CNMF (Kullback-Leibler-Complex NMF) using fuzzy initial clustering to improve the performance of BSS in the unsupervised mode. In addition, we use LPC error clustering as a powerful criterion especially for separating harmonic signals such as certain speech sources from their multi-layer KL-CNMF components. The results on speech mixtures of the TIMIT database based on signal to distortion ratio (SDR) and signal to interference ratio (SIR) show that the proposed system significantly outperforms the baseline system which is an NMF-based BSS with LPC error clustering.

[1]  Tuomas Virtanen,et al.  Sound Source Separation Using Sparse Coding with Temporal Continuity Objective , 2003, ICMC.

[2]  S. Amari,et al.  Nonnegative Matrix and Tensor Factorization [Lecture Notes] , 2008, IEEE Signal Processing Magazine.

[3]  Andrzej Cichocki,et al.  Nonnegative Matrix and Tensor Factorization T , 2007 .

[4]  Hirokazu Kameoka,et al.  Complex NMF with the generalized Kullback-Leibler divergence , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5]  Phetcharat Parathai,et al.  Blind source separation using statistical nonnegative matrix factorization , 2015 .

[6]  Xin Guo,et al.  NMF-based blind source separation using a linear predictive coding error clustering criterion , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7]  Hirokazu Kameoka,et al.  Complex NMF: A new sparse representation for acoustic signals , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[8]  Tuomas Virtanen,et al.  Monaural Sound Source Separation by Nonnegative Matrix Factorization With Temporal Continuity and Sparseness Criteria , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  Volker Gnann SOURCE-FILTER BASED CLUSTERING FOR MONAURAL BLIND SOURCE SEPARATION , 2009 .

[10]  Dan Barry,et al.  Clustering NMF basis functions using Shifted NMF for monaural sound source separation , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11]  Zhuo Chen,et al.  Deep clustering: Discriminative embeddings for segmentation and separation , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).