Perceptual long-term harmonic plus noise modeling for speech data compression

The harmonic plus noise model (HNM) is widely used for the modeling of audio signals. In this paper, we introduce perceptual frequency masking to the 2-band HNM, developed by Stylianou et al., applied to speech signals. An auditory model is used to recognize inaudible sinusoids, which will be removed from the set of model's parameters in order to reduce the data size for speech coding. The proposed perceptual HNM was applied to a large speech database from TIMIT and HINT and has proved to achieve an important (up to 50% in short term frames) parameters-rate compression, yielding a significant data-rates reduction for the long-term (LT) HNM model. The latter is based on LT trajectory modeling of the Short-Term (ST) HNM parameters. Objective and subjective quality evaluation shows that the perceptual HNM introduces no additional distortion compared to the generic 2-band HNM.

[1]  Methods for objective and subjective assessment of quality Perceptual evaluation of speech quality ( PESQ ) : An objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs , 2002 .

[2]  Mark J. T. Smith,et al.  Speech analysis/synthesis and modification using an analysis-by-synthesis/overlap-add sinusoidal model , 1997, IEEE Trans. Speech Audio Process..

[3]  Laurent Girin,et al.  A Long-Term Harmonic Plus Noise Model for Speech Signals , 2011, INTERSPEECH.

[4]  James D. Johnston,et al.  Transform coding of audio signals using perceptual noise criteria , 1988, IEEE J. Sel. Areas Commun..

[5]  Heiko Purnhagen,et al.  HILN-the MPEG-4 parametric audio coding tools , 2000, 2000 IEEE International Symposium on Circuits and Systems. Emerging Technologies for the 21st Century. Proceedings (IEEE Cat No.00CH36353).

[6]  Andreas Spanias,et al.  A review of algorithms for perceptual coding of digital audio signals , 1997, Proceedings of 13th International Conference on Digital Signal Processing.

[7]  Laurent Girin,et al.  Long-term modelling of parameters trajectories for the harmonic plus noise model of speech signals , 2010 .

[8]  Laurent Girin,et al.  Long-term flexible 2D cepstral modeling of speech spectral amplitudes , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[9]  Thomas F. Quatieri,et al.  Speech analysis/Synthesis based on a sinusoidal representation , 1986, IEEE Trans. Acoust. Speech Signal Process..

[10]  A. Spanias,et al.  Perceptual coding of digital audio , 2000, Proceedings of the IEEE.

[11]  Heiko Purnhagen Advances in parametric audio coding , 1999, Proceedings of the 1999 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. WASPAA'99 (Cat. No.99TH8452).

[12]  Yannis Stylianou,et al.  Applying the harmonic plus noise model in concatenative speech synthesis , 2001, IEEE Trans. Speech Audio Process..

[13]  Jae S. Lim,et al.  Multiband excitation vocoder , 1988, IEEE Transactions on Acoustics, Speech, and Signal Processing.

[14]  Eberhard Zwicker,et al.  The Ear As a Communication Receiver , 1999 .

[15]  P. Boersma ACCURATE SHORT-TERM ANALYSIS OF THE FUNDAMENTAL FREQUENCY AND THE HARMONICS-TO-NOISE RATIO OF A SAMPLED SOUND , 1993 .

[16]  Hugo Van hamme,et al.  Estimation of the voicing cut-off frequency contour of natural speech based on harmonic and aperiodic energies , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[17]  Andreas Spanias,et al.  Sinusoidal Analysis-Synthesis of Audio Using Perceptual Criteria , 2003, EURASIP J. Adv. Signal Process..