Coding with side information techniques for LSF reconstruction in voice over IP

In applications like VoIP, speech codecs have to deal with excessive packet losses, caused by network errors and/or delays. In this paper, a new method for the reconstruction of lost speech spectral envelopes is presented, which is based on a statistical estimation function. We suggest the usage of a minimal "corrective" bitstream and propose coding with side information (CSI) techniques for an efficient forward error correction (FEC) strategy. The proposed methods are tested on multiple scenarios of missing frames. Objective results indicate that with only 4 bits per lost frame, a spectral distortion reduction of 0.77-1.14 dB is achieved, compared to results obtained by current state-of-the-art estimation methods. Compared to "predictive" estimation methods, the use of the jitter buffer as side information and 4 bits per lost frame provide a 42% reduction of spectral distortion for single packet losses, and a 32% reduction for double packet losses. Subjective results indicate that the corrected speech has fewer artifacts.

[1]  Eitan Altman,et al.  Queueing analysis of simple FEC schemes for IP telephony , 2001, Proceedings IEEE INFOCOM 2001. Conference on Computer Communications. Twentieth Annual Joint Conference of the IEEE Computer and Communications Society (Cat. No.01CH37213).

[2]  R. K. Yarlagadda,et al.  Markov chain prediction for missing speech frame compensation , 2000, 2000 IEEE Workshop on Speech Coding. Proceedings. Meeting the Challenges of the New Millennium (Cat. No.00EX421).

[3]  Jan Skoglund,et al.  Predictive VQ for noisy channel spectrum coding: AR or MA? , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Philippe Gournay,et al.  A study of design compromises for speech coders in packet networks , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  W. Bastiaan Kleijn,et al.  Gaussian mixture model based mutual information estimation between frequency bands in speech , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  Rainer Martin,et al.  Estimation of missing LSF parameters using Gaussian mixture models , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[7]  Yannis Stylianou,et al.  Combined estimation/coding of highband spectral envelopes for speech spectrum expansion , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  Per Hedelin,et al.  Model based spectrum prediction , 2000, 2000 IEEE Workshop on Speech Coding. Proceedings. Meeting the Challenges of the New Millennium (Cat. No.00EX421).

[9]  Peter J. Patrick Enhancement of band-limited speech signals , 1983 .

[10]  R. Gray,et al.  A new class of lower bounds to information rates of stationary sources via conditional rate-distortion functions , 1973, IEEE Trans. Inf. Theory.

[11]  Eric Moulines,et al.  Continuous probabilistic transform for voice conversion , 1998, IEEE Trans. Speech Audio Process..