Real-Time Signal Estimation From Modified Short-Time Fourier Transform Magnitude Spectra

An algorithm for estimating signals from short-time magnitude spectra is introduced offering a significant improvement in quality and efficiency over current methods. The key issue is how to invert a sequence of overlapping magnitude spectra (a ldquospectrogramrdquo) containing no phase information to generate a real-valued signal free of audible artifacts. Also important is that the algorithm performs in real-time, both structurally and computationally. In the context of spectrogram inversion, structurally real-time means that the audio signal at any given point in time only depends on transform frames at local or prior points in time. Computationally, real-time means that the algorithm is efficient enough to run in less time than the reconstructed audio takes to play on the available hardware. The spectrogram inversion algorithm is parameterized to allow tradeoffs between computational demands and the quality of the signal reconstruction. The algorithm is applied to audio time-scale and pitch modification and compared to classical algorithms for these tasks on a variety of signal types including both monophonic and polyphonic audio signals such as speech and music.

[1]  Monson H. Hayes,et al.  Phase retrieval using a window function , 1993, IEEE Trans. Signal Process..

[2]  Werner Verhelst,et al.  An overlap-add technique based on waveform similarity (WSOLA) for high quality time-scale modification of speech , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  M. Porat,et al.  Optimal signal reconstruction from spectral amplitude , 1997, Proceedings of 13th International Conference on Digital Signal Processing.

[4]  A. Oppenheim,et al.  Signal reconstruction from signed Fourier transform magnitude , 1983 .

[5]  Eric Moulines,et al.  Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones , 1989, Speech Commun..

[6]  Thomas F. Quatieri,et al.  Shape invariant time-scale and pitch modification of speech , 1992, IEEE Trans. Signal Process..

[7]  Brendan J. Frey,et al.  Probabilistic Inference of Speech Signals from Phaseless Spectrograms , 2003, NIPS.

[8]  D. Griffin,et al.  Speech synthesis from short-time Fourier transform magnitude and its application to speech processing , 1984, ICASSP.

[9]  Michael R. Portnoff Magnitude-phase relationships for short-time Fourier transforms based on Gaussian analysis windows , 1979, ICASSP.

[10]  Jean Laroche,et al.  Improved phase vocoder time-scale modification of audio , 1999, IEEE Trans. Speech Audio Process..

[11]  A. Wilgus,et al.  High quality time-scale modification for speech , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12]  Mark Dolson,et al.  The Phase Vocoder: A Tutorial , 1986 .

[13]  Eugene Coyle,et al.  High quality time-scale modification of speech using a peak alignment overlap-add algorithm (PAOLA) , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[14]  Moshe Porat,et al.  On signal reconstruction from Fourier magnitude , 2001, ICECS 2001. 8th IEEE International Conference on Electronics, Circuits and Systems (Cat. No.01EX483).

[15]  B. Yegnanarayana,et al.  Significance of group delay functions in signal reconstruction from spectral magnitude or phase , 1984 .

[16]  Jae S. Lim,et al.  Signal estimation from modified short-time Fourier transform , 1983, ICASSP.

[17]  Jae S. Lim,et al.  Algorithms for signal reconstruction from short-time Fourier transform magnitude , 1983, ICASSP.

[18]  Lonce Wyse,et al.  AN EFFICIENT ALGORITHM FOR REAL-TIME SPECTROGRAM INVERSION , 2005 .

[19]  A. Oppenheim,et al.  Signal reconstruction from phase or magnitude , 1980 .

[20]  Jae S. Lim,et al.  Signal reconstruction from the short-time Fourier transform magnitude , 1982, ICASSP.

[21]  M. Hayes,et al.  Convergence of iterative nonexpansive signal reconstruction algorithms , 1981 .

[22]  A. Oppenheim,et al.  Iterative techniques for minimum phase signal reconstruction from phase or magnitude , 1980 .

[23]  Richard F. Lyon,et al.  Auditory model inversion for sound separation , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.