论文信息 - Speech Enhancement Using Non-Negative Spectrogram Models with Mel-Generalized Cepstral Regularization

Speech Enhancement Using Non-Negative Spectrogram Models with Mel-Generalized Cepstral Regularization

Spectral domain speech enhancement algorithms based on nonnegative spectrogram models such as non-negative matrix factorization (NMF) and non-negative matrix factor deconvolution are powerful in terms of signal recovery accuracy, however they do not directly lead to an enhancement in the feature domain (e.g., cepstral domain) or in terms of perceived quality. We have previously proposed a method that makes it possible to enhance speech in the spectral and cepstral domains simultaneously. Although this method was shown to be effective, the devised algorithm was computationally demanding. This paper proposes yet another formulation that allows for a fast implementation by replacing the regularization term with a divergence measure between the NMF model and the mel-generalized cepstral (MGC) representation of the target spectrum. Since the MGC is an auditory-motivated representation of an audio signal widely used in parametric speech synthesis, we also expect the proposed method to have an effect in enhancing the perceived quality. Experimental results revealed the effectiveness of the proposed method in terms of both the signal-to-distortion ratio and the cepstral distance.

Li Li | Hirokazu Kameoka | Tomoki Toda | Shoji Makino

[1] Li Li,et al. Semi-Supervised Joint Enhancement of Spectral and Cepstral Sequences of Noisy Speech , 2016, INTERSPEECH.

[2] D. Hunter,et al. A Tutorial on MM Algorithms , 2004 .

[3] 益子貴史,et al. HMM-based speech synthesis and its applications , 2003 .

[4] Shigeru Katagiri,et al. ATR Japanese speech database as a tool of speech recognition and synthesis , 1990, Speech Commun..

[5] H. Kameoka,et al. Convergence-guaranteed multiplicative algorithms for nonnegative matrix factorization with β-divergence , 2010, 2010 IEEE International Workshop on Machine Learning for Signal Processing.

[6] Paris Smaragdis,et al. Non-negative Matrix Factor Deconvolution; Extraction of Multiple Sound Sources from Monophonic Inputs , 2004, ICA.

[7] Keiichi Tokuda,et al. Mel-generalized cepstral analysis - a unified approach to speech spectral estimation , 1994, ICSLP.

[8] Jérôme Idier,et al. Algorithms for Nonnegative Matrix Factorization with the β-Divergence , 2010, Neural Computation.

[9] Bhiksha Raj,et al. Supervised and Semi-supervised Separation of Sounds from Single-Channel Mixtures , 2007, ICA.