Shape invariant time-scale and pitch modification of speech

The simplified linear model of speech production predicts that when the rate of articulation is changed, the resulting waveform takes on the appearance of the original, except for a change in the time scale. A time-scale modification system that preserves this shape-invariance property during voicing is developed. This is done using a version of the sinusoidal analysis-synthesis system that models and independently modifies the phase contributions of the vocal tract and vocal cord excitation. An important property of the system is its ability to perform time-varying rates of change. Extensions of the method are applied to fixed and time-varying pitch modification of speech. The sine-wave analysis-synthesis system also allows for shape-invariant joint time-scale and pitch modification, and allows for the adjustment of the time scale and pitch according to speech characteristics such as the degree of voicing. >

[1]  T. Quatieri,et al.  Phase modelling and its application to sinusoidal transform coding , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Thomas F. Quatieri,et al.  Pitch estimation and voicing detection based on a sinusoidal speech model , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[3]  G. Fairbanks,et al.  Method for time of frequency compression-expansion of speech , 1954 .

[4]  Hoo-min D Toong,et al.  A study of time-compressed speech , 1974 .

[5]  Jae Lim,et al.  Signal reconstruction from short-time Fourier transform magnitude , 1983 .

[6]  Jr. T. Quatieri Minimum and mixed phase speech analysis-synthesis by adaptive homomorphic deconvolution , 1979 .

[7]  D. Paul The spectral envelope estimation vocoder , 1981 .

[8]  Thomas F. Quatieri,et al.  Speech analysis/Synthesis based on a sinusoidal representation , 1986, IEEE Trans. Acoust. Speech Signal Process..

[9]  Xavier Serra,et al.  A system for sound analysis/transformation/synthesis based on a deterministic plus stochastic decomposition , 1989 .

[10]  Thomas F. Quatieri,et al.  Speech transformations based on a sinusoidal representation , 1986, IEEE Trans. Acoust. Speech Signal Process..

[11]  Thomas F. Quatieri,et al.  Phase coherence in speech reconstruction for enhancement and coding applications , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[12]  M. Portnoff,et al.  Time-scale modification of speech based on short-time Fourier analysis , 1981 .

[13]  J. L. Flanagan,et al.  PHASE VOCODER , 2008 .