论文信息 - Probabilistic Inference of Speech Signals from Phaseless Spectrograms

Probabilistic Inference of Speech Signals from Phaseless Spectrograms

Many techniques for complex speech processing such as denoising and deconvolution, time/frequency warping, multiple speaker separation, and multiple microphone analysis operate on sequences of short-time power spectra (spectrograms), a representation which is often well-suited to these tasks. However, a significant problem with algorithms that manipulate spectrograms is that the output spectrogram does not include a phase component, which is needed to create a time-domain signal that has good perceptual quality. Here we describe a generative model of time-domain speech signals and their spectrograms, and show how an efficient optimizer can be used to find the maximum a posteriori speech signal, given the spectrogram. In contrast to techniques that alternate between estimating the phase and a spectrally-consistent signal, our technique directly infers the speech signal, thus jointly optimizing the phase and a spectrally-consistent signal. We compare our technique with a standard method using signal-to-noise ratios, but we also provide audio files on the web for the purpose of demonstrating the improvement in perceptual quality that our technique offers.

[1] Brendan J. Frey,et al. Probability Propagation and Iterative Decoding , 1996 .

[2] Jae Lim,et al. Signal estimation from modified short-time Fourier transform , 1984 .

[3] R. Fletcher. Practical Methods of Optimization , 1988 .

[4] Yann LeCun,et al. Real Time Voice Processing with Audiovisual Feedback: Toward Autonomous Agents with Perfect Pitch , 2002, NIPS.

[5] J. Besag. On the Statistical Analysis of Dirty Pictures , 1986 .

[6] Brendan J. Frey,et al. Factor graphs and the sum-product algorithm , 2001, IEEE Trans. Inf. Theory.

[7] A. Wilgus,et al. High quality time-scale modification for speech , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8] Eric A. Wan,et al. Removal of noise from speech using the dual EKF algorithm , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[9] Biing-Hwang Juang,et al. Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[10] Michael I. Jordan,et al. An Introduction to Variational Methods for Graphical Models , 1999, Machine-mediated learning.