Perceptual single-channel audio source separation by non-negative matrix factorization

This paper proposes a single-channel audio source decomposition method that integrates perceptual quality criteria into source separation. Unlike the existing methods, the proposed method applies a perceptually weighted non-negative matrix factorization on log-frequency spectrogram of the mixed signal. The weights are adaptively calculated for each critical band based on a perceptual model described by ITU-R BS. 1387 perceptual quality standard. It is shown that the proposed adaptive weighting scheme significantly improves the quality of audio sources estimated by minimizing the weighted divergence between the observed log-frequency spectrogram and the model.