A long term harmonic plus noise model for narrow-band speech coding at very low bit-rates

This paper presents a very low bit-rate speech codec based on the long-term Harmonic plus Noise Model (LT-HNM). The HNM is known to be efficient in terms of speech signal representation, thanks to the use of natural parameters: fundamental and voicing cut-off frequencies, harmonics and noise frequencies. Besides, the long-term modeling is particularly efficient in reducing the data size of the model parameters. In this paper we combine both approaches, long-term modeling and HNM, to develop a very low bit-rate coder for narrowband speech. The obtained bit-rates are as low as 2.3 kbps with objective listening quality (perceptual evaluation of speech quality PESQ) of 2.3.

[1]  Thomas P. Barnwell,et al.  A 2.4 kbit/s MELP coder candidate for the new U.S. Federal Standard , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[2]  Carla Teixeira Lopes,et al.  TIMIT Acoustic-Phonetic Continuous Speech Corpus , 2012 .

[3]  L. H. Anauer,et al.  Speech Analysis and Synthesis by Linear Prediction of the Speech Wave , 2000 .

[4]  Laurent Girin,et al.  A Long-Term Harmonic Plus Noise Model for Speech Signals , 2011, INTERSPEECH.

[5]  Jae S. Lim,et al.  Multiband excitation vocoder , 1988, IEEE Transactions on Acoustics, Speech, and Signal Processing.

[6]  Manfred R. Schroeder,et al.  Code-excited linear prediction(CELP): High-quality speech at very low bit rates , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  Laurent Girin,et al.  Long-term modelling of parameters trajectories for the harmonic plus noise model of speech signals , 2010 .

[8]  J.-P. Adoul,et al.  Harmonic-stochastic excitation (HSX) speech coding below 4 kbit/s , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[9]  Thomas F. Quatieri,et al.  Speech analysis/Synthesis based on a sinusoidal representation , 1986, IEEE Trans. Acoust. Speech Signal Process..

[10]  Mark J. T. Smith,et al.  Speech analysis/synthesis and modification using an analysis-by-synthesis/overlap-add sinusoidal model , 1997, IEEE Trans. Speech Audio Process..

[11]  David Grant Rowe,et al.  Techniques for Harmonic Sinusoidal Coding , 1997 .

[12]  Methods for objective and subjective assessment of quality Perceptual evaluation of speech quality ( PESQ ) : An objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs , 2002 .

[13]  Sadaoki Furui,et al.  Advances in Speech Signal Processing , 1991 .

[14]  O. Cappé,et al.  Regularized estimation of cepstrum envelope from discrete frequency points , 1995, Proceedings of 1995 Workshop on Applications of Signal Processing to Audio and Accoustics.

[15]  Hugo Van hamme,et al.  Estimation of the voicing cut-off frequency contour of natural speech based on harmonic and aperiodic energies , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[16]  Laurent Girin,et al.  Low Bit-Rate Speech Codec Based on a Long-Term Harmonic Plus Noise Model , 2016 .

[17]  Masayuki Nishiguchi,et al.  Parametric speech coding-HVXC at 2.0-4.0 kbps , 1999, 1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351).

[18]  Sonia Djaziri-Larbi,et al.  Perceptual long-term harmonic plus noise modeling for speech data compression , 2015, 2015 IEEE Global Conference on Signal and Information Processing (GlobalSIP).