A Long-Term Harmonic Plus Noise Model for Speech Signals

The harmonic plus noise model (HNM) is widely used for spectral modeling of mixed harmonic/noise speech sounds. In this paper, we present an analysis/synthesis system based on a long-term two-band HNM. "Long-term" means that the time-trajectories of the HNM parameters are modeled using "smooth" (discrete cosine) functions depending on a small set of parameters. The goal is to capture and exploit the long- term correlation of spectral components on time segments of up to several hundreds of ms. The proposed long-term HNM enables joint compact representation of signals (thus a poten- tial for low bit-rate coding) and easy signal transformation (e.g. time stretching) directly from the long-term parameters. Exper- iments show that it can be compared favourably with the short- term version in terms of parameter rates and signal quality. Index Terms: speech analysis/synthesis, harmonic + noise model, long-term processing.

[1]  Laurent Girin,et al.  Long-term flexible 2D cepstral modeling of speech spectral amplitudes , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[2]  Laurent Girin,et al.  Adaptive Long-Term Coding of LSF Parameters Trajectories for Large-Delay/Very- to Ultra-Low Bit-Rate Speech Coding , 2010, EURASIP J. Audio Speech Music. Process..

[3]  Xavier Rodet,et al.  An Improved Cepstral Method for Deconvolution of Source-Filter Systems with Discrete Spectra: Application to Musical Sound Signals , 1990, ICMC.

[4]  Allen Gersho,et al.  Vector quantization and signal compression , 1991, The Kluwer international series in engineering and computer science.

[5]  P. Boersma ACCURATE SHORT-TERM ANALYSIS OF THE FUNDAMENTAL FREQUENCY AND THE HARMONICS-TO-NOISE RATIO OF A SAMPLED SOUND , 1993 .

[6]  Mark J. T. Smith,et al.  Speech analysis/synthesis and modification using an analysis-by-synthesis/overlap-add sinusoidal model , 1997, IEEE Trans. Speech Audio Process..

[7]  Yannis Stylianou,et al.  Applying the harmonic plus noise model in concatenative speech synthesis , 2001, IEEE Trans. Speech Audio Process..

[8]  Mark A. Clements,et al.  Sinusoidal modeling and modification of unvoiced speech , 1997, IEEE Trans. Speech Audio Process..

[9]  Thomas F. Quatieri,et al.  Speech analysis/Synthesis based on a sinusoidal representation , 1986, IEEE Trans. Acoust. Speech Signal Process..

[10]  James L. Flanagan,et al.  Speech Compression by Polynomial Approximation , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Hugo Van hamme,et al.  Estimation of the voicing cut-off frequency contour of natural speech based on harmonic and aperiodic energies , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[12]  O. Cappé,et al.  Regularized estimation of cepstrum envelope from discrete frequency points , 1995, Proceedings of 1995 Workshop on Applications of Signal Processing to Audio and Accoustics.

[13]  Bishnu S. Atal,et al.  Efficient coding of LPC parameters by temporal decomposition , 1983, ICASSP.

[14]  Laurent Girin,et al.  Perceptual Long-Term Variable-Rate Sinusoidal Modeling of Speech , 2007, IEEE Transactions on Audio, Speech, and Language Processing.