Music signal separation based on Bayesian spectral amplitude estimator with automatic target prior adaptation

In this paper, we propose a new approach for addressing music signal separation based on the generalized Bayesian estimator with automatic prior adaptation. This method consists of three parts, namely, the generalized MMSE-STSA estimator with a flexible target signal prior, the NMF-based dynamic interference spectrogram estimator, and closed-form parameter estimation for the statistical model of the target signal based on higher-order statistics. The statistical model parameter of the hidden target signal can be detected automatically for optimal Bayesian estimation with online target-signal prior adaptation. Our experimental evaluation can show the efficacy of the proposed method.

[1]  Emad M. Grais,et al.  Single channel speech music separation using nonnegative matrix factorization and spectral masks , 2011, 2011 17th International Conference on Digital Signal Processing (DSP).

[2]  Philipos C. Loizou,et al.  Speech Enhancement: Theory and Practice , 2007 .

[3]  Bhiksha Raj,et al.  Non-negative matrix factorization based compensation of music for automatic speech recognition , 2010, INTERSPEECH.

[4]  Rainer Martin,et al.  Parameterized MMSE spectral magnitude estimation for the enhancement of noisy speech , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[5]  K. Shikano,et al.  Music signal separation by orthogonality and maximum-distance constrained nonnegative matrix factorization with target signal information , 2011 .

[6]  Kiyohiro Shikano,et al.  Estimation of Shape Parameter of GGD Function by Negentropy Matching , 2005, Neural Processing Letters.

[7]  Hirokazu Kameoka,et al.  Constrained and regularized variants of non-negative matrix factorization incorporating music-specific constraints , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8]  Andrzej Cichocki,et al.  A Multiplicative Algorithm for Convolutive Non-Negative Matrix Factorization Based on Squared Euclidean Distance , 2009, IEEE Transactions on Signal Processing.

[9]  Paul R. White,et al.  Mmse Speech Spectral Amplitude Estimators With Chi and Gamma Speech Priors , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[10]  David Malah,et al.  Speech enhancement using a minimum mean-square error log-spectral amplitude estimator , 1984, IEEE Trans. Acoust. Speech Signal Process..

[11]  Kiyohiro Shikano,et al.  Music signal separation by supervised nonnegative matrix factorization with basis deformation , 2013, 2013 18th International Conference on Digital Signal Processing (DSP).

[12]  Tuomas Virtanen,et al.  Monaural Sound Source Separation by Nonnegative Matrix Factorization With Temporal Continuity and Sparseness Criteria , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  Tom Barker,et al.  Non-negative tensor factorisation of modulation spectrograms for monaural sound source separation , 2013, INTERSPEECH.

[14]  E. Stacy A Generalization of the Gamma Distribution , 1962 .

[15]  Bhiksha Raj,et al.  Supervised and Semi-supervised Separation of Sounds from Single-Channel Mixtures , 2007, ICA.

[16]  Paris Smaragdis,et al.  Prediction based filtering and smoothing to exploit temporal dependencies in NMF , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[17]  Rémi Gribonval,et al.  Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[18]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[19]  Ephraim Speech enhancement using a minimum mean square error short-time spectral amplitude estimator , 1984 .

[20]  Kiyohiro Shikano,et al.  Superresolution-based stereo signal separation via supervised nonnegative matrix factorization , 2013, 2013 18th International Conference on Digital Signal Processing (DSP).