论文信息 - A Long-Term Harmonic Plus Noise Model for Speech Signals

A Long-Term Harmonic Plus Noise Model for Speech Signals

The harmonic plus noise model (HNM) is widely used for spectral modeling of mixed harmonic/noise speech sounds. In this paper, we present an analysis/synthesis system based on a long-term two-band HNM. "Long-term" means that the time-trajectories of the HNM parameters are modeled using "smooth" (discrete cosine) functions depending on a small set of parameters. The goal is to capture and exploit the long- term correlation of spectral components on time segments of up to several hundreds of ms. The proposed long-term HNM enables joint compact representation of signals (thus a poten- tial for low bit-rate coding) and easy signal transformation (e.g. time stretching) directly from the long-term parameters. Exper- iments show that it can be compared favourably with the short- term version in terms of parameter rates and signal quality. Index Terms: speech analysis/synthesis, harmonic + noise model, long-term processing.

[1] Laurent Girin,et al. Long-term flexible 2D cepstral modeling of speech spectral amplitudes , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[2] Laurent Girin,et al. Adaptive Long-Term Coding of LSF Parameters Trajectories for Large-Delay/Very- to Ultra-Low Bit-Rate Speech Coding , 2010, EURASIP J. Audio Speech Music. Process..

[3] Xavier Rodet,et al. An Improved Cepstral Method for Deconvolution of Source-Filter Systems with Discrete Spectra: Application to Musical Sound Signals , 1990, ICMC.

[4] Allen Gersho,et al. Vector quantization and signal compression , 1991, The Kluwer international series in engineering and computer science.

[5] P. Boersma. ACCURATE SHORT-TERM ANALYSIS OF THE FUNDAMENTAL FREQUENCY AND THE HARMONICS-TO-NOISE RATIO OF A SAMPLED SOUND , 1993 .

[6] Mark J. T. Smith,et al. Speech analysis/synthesis and modification using an analysis-by-synthesis/overlap-add sinusoidal model , 1997, IEEE Trans. Speech Audio Process..

[7] Yannis Stylianou,et al. Applying the harmonic plus noise model in concatenative speech synthesis , 2001, IEEE Trans. Speech Audio Process..

[8] Mark A. Clements,et al. Sinusoidal modeling and modification of unvoiced speech , 1997, IEEE Trans. Speech Audio Process..

[9] Thomas F. Quatieri,et al. Speech analysis/Synthesis based on a sinusoidal representation , 1986, IEEE Trans. Acoust. Speech Signal Process..

[10] James L. Flanagan,et al. Speech Compression by Polynomial Approximation , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[11] Hugo Van hamme,et al. Estimation of the voicing cut-off frequency contour of natural speech based on harmonic and aperiodic energies , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[12] O. Cappé,et al. Regularized estimation of cepstrum envelope from discrete frequency points , 1995, Proceedings of 1995 Workshop on Applications of Signal Processing to Audio and Accoustics.

[13] Bishnu S. Atal,et al. Efficient coding of LPC parameters by temporal decomposition , 1983, ICASSP.

[14] Laurent Girin,et al. Perceptual Long-Term Variable-Rate Sinusoidal Modeling of Speech , 2007, IEEE Transactions on Audio, Speech, and Language Processing.