Non-parametric techniques for pitch-scale and time-scale modification of speech

Abstract Time-scale and, to a lesser extent, pitch-scale modifications of speech and audio signals are the subject of major theoretical and practical interest. Applications are numerous, including, to name but a few, text-to-speech synthesis (based on acoustical unit concatenation), transformation of voice characteristics, foreign language learning but also audio monitoring or film/soundtrack post-synchronization. To fulfill the need for high-quality time and pitch-scaling, a number of algorithms have been proposed recently, along with their real-time implementation, sometimes for very inexpensive hardware. It appears that most of these algorithms can be viewed as slight variations of a small number of basic schemes. This contribution reviews frequency-domain algorithms (phase-vocoder) and time-domain algorithms (Time-Domain Pitch-Synchronous Overlap/Add and the like) in the same framework. More recent variations of these schemes are also presented.

[1]  Luís B. Almeida,et al.  Frequency-varying sinusoidal modeling of speech , 1989, IEEE Trans. Acoust. Speech Signal Process..

[2]  Dennis L. Wilson,et al.  Some improvements on the synchronized-overlap-add method of time scale modification for use in real-time speech compression and noise filtering , 1988, IEEE Trans. Acoust. Speech Signal Process..

[3]  Werner Verhelst,et al.  An overlap-add technique based on waveform similarity (WSOLA) for high quality time-scale modification of speech , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  M. Portnoff Time-frequency representation of digital signals and systems based on short-time Fourier analysis , 1980 .

[5]  S. Seneff System to independently modify excitation and/Or spectrum of speech waveform without explicit pitch extraction , 1982 .

[6]  Mark Dolson,et al.  The Phase Vocoder: A Tutorial , 1986 .

[7]  A. W. M. van den Enden,et al.  Discrete Time Signal Processing , 1989 .

[8]  Luís B. Almeida,et al.  Variable-frequency synthesis: An improved harmonic coding scheme , 1984, ICASSP.

[9]  T.H. Crystal,et al.  Linear prediction of speech , 1977, Proceedings of the IEEE.

[10]  Julius O. Smith,et al.  Spectral Modeling Synthesis , 1989, ICMC.

[11]  M. Portnoff,et al.  Time-scale modification of speech based on short-time Fourier analysis , 1981 .

[12]  浜田 晴夫,et al.  1989 IEEE ASSP Workshop on Applications of Signal Processing to Audio and Acoustics , 1990 .

[13]  M. Portnoff Short-time Fourier analysis of sampled speech , 1981 .

[14]  Xavier Rodet,et al.  Generalized functional approximation for source-filter system modeling , 1991, EUROSPEECH.

[15]  Jont B. Allen,et al.  Short term spectral analysis, synthesis, and modification by discrete Fourier transform , 1977 .

[16]  Xavier Serra,et al.  A sound analysis/synthesis system based on a deterministic plus stochastic decomposition , 1990 .

[17]  Matti Karjalainen,et al.  Microphonemic method of speech synthesis , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[18]  Eric Moulines,et al.  Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones , 1989, Speech Commun..

[19]  Eric Moulines,et al.  HNS: Speech modification based on a harmonic+noise model , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[20]  A. Wilgus,et al.  High quality time-scale modification for speech , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[21]  Julius O. Smith,et al.  Spectral modeling synthesis: A sound analysis/synthesis based on a deterministic plus stochastic decomposition , 1990 .

[22]  E. Hardam,et al.  High quality time scale modification of speech signals using fast synchronized-overlap-add algorithms , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[23]  Yannis Stylianou,et al.  HNM: a simple, efficient harmonic+noise model for speech , 1993, Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[24]  Jae Lim,et al.  Signal reconstruction from short-time Fourier transform magnitude , 1983 .

[25]  Mark J. T. Smith,et al.  Analysis-by-Synthesis/Overlap-Add Sinusoidal Modeling Applied to the Analysis and Synthesis of Musical Tones , 1992 .

[26]  Thomas F. Quatieri,et al.  Speech transformations based on a sinusoidal representation , 1986, IEEE Trans. Acoust. Speech Signal Process..

[27]  Amro El-Jaroudi,et al.  Discrete all-pole modeling , 1991, IEEE Trans. Signal Process..

[28]  S. Biyiksiz,et al.  Multirate digital signal processing , 1985, Proceedings of the IEEE.

[29]  Jae S. Lim,et al.  Multiband excitation vocoder , 1988, IEEE Transactions on Acoustics, Speech, and Signal Processing.

[30]  Jae S. Lim,et al.  Advanced topics in signal processing , 1987 .

[31]  Jont B. Allen Applications of the short time Fourier transform to speech processing and spectral analysis , 1982, IEEE International Conference on Acoustics, Speech, and Signal Processing.

[32]  Eric Moulines,et al.  Voice transformation using PSOLA technique , 1991, Speech Commun..

[33]  Xavier Rodet,et al.  Diphone Sound Synthesis Based on Spectral Envelopes and Harmonic/Noise Excitation Functions , 1988, ICMC.

[34]  Ronald E. Crochiere,et al.  A weighted overlap-add method of short-time Fourier analysis/Synthesis , 1980 .

[35]  Jean Laroche Autocorrelation method for high-quality time/pitch-scaling , 1993, Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[36]  Thomas F. Quatieri,et al.  Speech analysis/Synthesis based on a sinusoidal representation , 1986, IEEE Trans. Acoust. Speech Signal Process..

[37]  Peter Kabal,et al.  Time-scale modification of speech using an incremental time-frequency approach with waveform structure compensation , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[38]  Xavier Serra,et al.  A system for sound analysis/transformation/synthesis based on a deterministic plus stochastic decomposition , 1989 .

[39]  A. Oppenheim,et al.  Signal reconstruction from phase or magnitude , 1980 .

[40]  Jae Lim,et al.  Signal estimation from modified short-time Fourier transform , 1984 .