Comparing several models for perceptual long-term modeling of amplitude and phase trajectories of sinusoidal speech

The so-called Long-Term (LT) modeling of sinusoidal parameters, proposed in previous papers, consists in modeling the entire time-trajectory of amplitude and phase parameters over large sections of voiced speech, differing from usual ShortTerm models, which are defined on a frame-by-frame basis. In the present paper, we focus on a specific novel contribution to this general framework: the comparison of four different LongTerm models, namely a polynomial model, a model based on discrete cosine functions, and combinations of discrete cosine with sine functions or polynomials. Their performances are compared in terms of synthesis signal quality, data compression and modeling accuracy, and the interest of the presented study for speech coding is shown.

[1]  Laurent Girin,et al.  Perceptually weighted long term modeling of sinusoidal speech amplitude trajectories , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[2]  Mark J. T. Smith,et al.  Speech analysis/synthesis and modification using an analysis-by-synthesis/overlap-add sinusoidal model , 1997, IEEE Trans. Speech Audio Process..

[3]  Yinong Ding,et al.  Processing of Musical Tones Using a Combined Quadratic Polynomial-Phase Sinusoid and Residual (QUASAR) Signal Model , 1997 .

[4]  Laurent Girin,et al.  Long term modeling of phase trajectories within the speech sinusoidal model framework , 2004, INTERSPEECH.

[5]  Thomas F. Quatieri,et al.  Speech analysis/Synthesis based on a sinusoidal representation , 1986, IEEE Trans. Acoust. Speech Signal Process..

[6]  Laurent Girin,et al.  Perceptual Long-Term Variable-Rate Sinusoidal Modeling of Speech , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  James L. Flanagan,et al.  Speech coding using trajectory compression and multiple sensors , 2004, INTERSPEECH.

[8]  Laurent Girin,et al.  Comparing the order of a polynomial phase model for the synthesis of quasi-harmonic audio signals , 2003, 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (IEEE Cat. No.03TH8684).

[9]  Hugo Fastl,et al.  Psychoacoustics: Facts and Models , 1990 .