A sinusoidal voice over packet coder tailored for the frame-erasure channel

A speech coder tailored especially for the frame-erasure channel-the sinusoidal voice over packet coder (SVOPC)-is proposed. Based on a classified approach, avoiding interframe coding techniques, and synthesizing its output from slowly varying parameters, the coder is inherently robust to packet loss. SVOPC is based on quasi-harmonic modeling of the linear prediction (LP) residual. Both the sinusoidal amplitudes and phases are explicitly encoded using new methods based on Gaussian mixture models. A wide-band (16-kHz sampling frequency) implementation of the coder provides synthesized speech of good subjective quality at around 20 kbps. SVOPC is evaluated by means of subjective listening tests, and compared to a reference system based on G.722.2 (the AMR wide-band codec). Under frame erasure conditions (5%-30% frame erasures generated according to a Gilbert model), SVOPC clearly outperforms G.722.2.

[1]  Per Hedelin A tone oriented voice excited vocoder , 1981, ICASSP.

[2]  Jae S. Lim,et al.  Multiband excitation vocoder , 1988, IEEE Transactions on Acoustics, Speech, and Signal Processing.

[3]  Allen Gersho,et al.  Vector quantization and signal compression , 1991, The Kluwer international series in engineering and computer science.

[4]  Jun Matsumoto,et al.  Vector quantized MBE with simplified V/UV division at 3.0 kbit/s , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  Jonathan G. Fiscus,et al.  Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[6]  W. Bastiaan Kleijn,et al.  Encoding speech using prototype waveforms , 1993, IEEE Trans. Speech Audio Process..

[7]  John H. L. Hansen,et al.  Discrete-Time Processing of Speech Signals , 1993 .

[8]  Kuldip K. Paliwal,et al.  Efficient vector quantization of LPC parameters at 24 bits/frame , 1993, IEEE Trans. Speech Audio Process..

[9]  Allen Gersho,et al.  Variable Dimension Vector Quantization of Speech Spectra for Low Rate Vocoders , 1994, Data Compression Conference.

[10]  David L. Black,et al.  An Architecture for Differentiated Service , 1998 .

[11]  Jesper Jensen,et al.  Exponential sinusoidal modeling of transitional speech segments , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[12]  Donald F. Towsley,et al.  Measurement and modelling of the temporal dependence in packet loss , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).

[13]  Donald F. Towsley,et al.  Adaptive FEC-based error control for Internet telephony , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).

[14]  S. Van Gerven,et al.  LSP quantization in wideband speech coders , 1999, 1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351).

[15]  S.A. Ramprashad High quality embedded wideband speech coding using an inherently layered coding paradigm , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[16]  P. Hedelin A sinusoidal LPC vocoder , 2000, 2000 IEEE Workshop on Speech Coding. Proceedings. Meeting the Challenges of the New Millennium (Cat. No.00EX421).

[17]  Jesper Jensen,et al.  Harmonic exponential modeling of transitional speech segments , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[18]  Jan Skoglund,et al.  Vector quantization based on Gaussian mixture models , 2000, IEEE Trans. Speech Audio Process..

[19]  Hans Hannu,et al.  Efficient transport of voice over IP over cellular links , 2000, Globecom '00 - IEEE. Global Telecommunications Conference. Conference Record (Cat. No.00CH37137).

[20]  Lingfen Sun,et al.  Impact of Packet Loss Location on Perceived Speech Quality , 2001 .

[21]  Vladimir Cuperman,et al.  Coding of variable dimension speech spectral vectors using weighted nonsquare transform vector quantization , 2001, IEEE Trans. Speech Audio Process..

[22]  Kathryn Momtahan,et al.  Linear prediction based packet loss concealment algorithm for PCM coded speech , 2001, IEEE Trans. Speech Audio Process..

[23]  Allen Gersho,et al.  Hybrid coding: combined harmonic and waveform coding of speech at 4 kb/s , 2001, IEEE Trans. Speech Audio Process..

[24]  P. Hedelin,et al.  Packet loss concealment based on sinusoidal modeling , 2002, Speech Coding, 2002, IEEE Workshop Proceedings..

[25]  S.H. Jensen,et al.  Time-scaling of sinusoids for intelligent jitter buffer in packet based telephony , 2002, Speech Coding, 2002, IEEE Workshop Proceedings..

[26]  Peter Vary,et al.  Pyramid CELP: Embedded speech coding for packet communications , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[27]  Magnus Westerlund,et al.  Real-Time Transport Protocol (RTP) Payload Format and File Storage Format for the Adaptive Multi-Rate (AMR) and Adaptive Multi-Rate Wideband (AMR-WB) Audio Codecs , 2002, RFC.

[28]  B. Goode,et al.  Voice over Internet protocol (VoIP) , 2002, Proc. IEEE.

[29]  Jan Skoglund,et al.  iLBC - a linear predictive coder with robustness to packet losses , 2002, Speech Coding, 2002, IEEE Workshop Proceedings..

[30]  Per Hedelin,et al.  Packet loss concealment based on sinusoidal extrapolation , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[31]  Søren Holdt Jensen,et al.  Compressed domain packet loss concealment of sinusoidally coded speech , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[32]  Per Hedelin,et al.  Error protection and packet loss concealment based on a signal matched sinusoidal vocoder , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[33]  Jonas Lindblom Coding Speech for Packet Networks , 2003 .

[34]  Bhaskar D. Rao,et al.  PDF optimized parametric vector quantization of speech line spectral frequencies , 2003, IEEE Trans. Speech Audio Process..

[35]  Per Hedelin,et al.  Variable-dimension quantization of sinusoidal amplitudes using Gaussian mixture models , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[36]  METHODS FOR SUBJECTIVE DETERMINATION OF TRANSMISSION QUALITY Summary , 2022 .