论文信息 - A sinusoidal voice over packet coder tailored for the frame-erasure channel

A sinusoidal voice over packet coder tailored for the frame-erasure channel

A speech coder tailored especially for the frame-erasure channel-the sinusoidal voice over packet coder (SVOPC)-is proposed. Based on a classified approach, avoiding interframe coding techniques, and synthesizing its output from slowly varying parameters, the coder is inherently robust to packet loss. SVOPC is based on quasi-harmonic modeling of the linear prediction (LP) residual. Both the sinusoidal amplitudes and phases are explicitly encoded using new methods based on Gaussian mixture models. A wide-band (16-kHz sampling frequency) implementation of the coder provides synthesized speech of good subjective quality at around 20 kbps. SVOPC is evaluated by means of subjective listening tests, and compared to a reference system based on G.722.2 (the AMR wide-band codec). Under frame erasure conditions (5%-30% frame erasures generated according to a Gilbert model), SVOPC clearly outperforms G.722.2.

Jonas Lindblom | J. Lindblom

[1] Per Hedelin. A tone oriented voice excited vocoder , 1981, ICASSP.

[2] Jae S. Lim,et al. Multiband excitation vocoder , 1988, IEEE Transactions on Acoustics, Speech, and Signal Processing.

[3] Allen Gersho,et al. Vector quantization and signal compression , 1991, The Kluwer international series in engineering and computer science.

[4] Jun Matsumoto,et al. Vector quantized MBE with simplified V/UV division at 3.0 kbit/s , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5] Jonathan G. Fiscus,et al. Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[6] W. Bastiaan Kleijn,et al. Encoding speech using prototype waveforms , 1993, IEEE Trans. Speech Audio Process..

[7] John H. L. Hansen,et al. Discrete-Time Processing of Speech Signals , 1993 .

[8] Kuldip K. Paliwal,et al. Efficient vector quantization of LPC parameters at 24 bits/frame , 1993, IEEE Trans. Speech Audio Process..

[9] Allen Gersho,et al. Variable Dimension Vector Quantization of Speech Spectra for Low Rate Vocoders , 1994, Data Compression Conference.

[10] David L. Black,et al. An Architecture for Differentiated Service , 1998 .

[11] Jesper Jensen,et al. Exponential sinusoidal modeling of transitional speech segments , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[12] Donald F. Towsley,et al. Measurement and modelling of the temporal dependence in packet loss , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).

[13] Donald F. Towsley,et al. Adaptive FEC-based error control for Internet telephony , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).

[14] S. Van Gerven,et al. LSP quantization in wideband speech coders , 1999, 1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351).

[15] S.A. Ramprashad. High quality embedded wideband speech coding using an inherently layered coding paradigm , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[16] P. Hedelin. A sinusoidal LPC vocoder , 2000, 2000 IEEE Workshop on Speech Coding. Proceedings. Meeting the Challenges of the New Millennium (Cat. No.00EX421).

[17] Jesper Jensen,et al. Harmonic exponential modeling of transitional speech segments , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[18] Jan Skoglund,et al. Vector quantization based on Gaussian mixture models , 2000, IEEE Trans. Speech Audio Process..

[19] Hans Hannu,et al. Efficient transport of voice over IP over cellular links , 2000, Globecom '00 - IEEE. Global Telecommunications Conference. Conference Record (Cat. No.00CH37137).

[20] Lingfen Sun,et al. Impact of Packet Loss Location on Perceived Speech Quality , 2001 .

[21] Vladimir Cuperman,et al. Coding of variable dimension speech spectral vectors using weighted nonsquare transform vector quantization , 2001, IEEE Trans. Speech Audio Process..

[22] Kathryn Momtahan,et al. Linear prediction based packet loss concealment algorithm for PCM coded speech , 2001, IEEE Trans. Speech Audio Process..

[23] Allen Gersho,et al. Hybrid coding: combined harmonic and waveform coding of speech at 4 kb/s , 2001, IEEE Trans. Speech Audio Process..

[24] P. Hedelin,et al. Packet loss concealment based on sinusoidal modeling , 2002, Speech Coding, 2002, IEEE Workshop Proceedings..

[25] S.H. Jensen,et al. Time-scaling of sinusoids for intelligent jitter buffer in packet based telephony , 2002, Speech Coding, 2002, IEEE Workshop Proceedings..

[26] Peter Vary,et al. Pyramid CELP: Embedded speech coding for packet communications , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[27] Magnus Westerlund,et al. Real-Time Transport Protocol (RTP) Payload Format and File Storage Format for the Adaptive Multi-Rate (AMR) and Adaptive Multi-Rate Wideband (AMR-WB) Audio Codecs , 2002, RFC.

[28] B. Goode,et al. Voice over Internet protocol (VoIP) , 2002, Proc. IEEE.

[29] Jan Skoglund,et al. iLBC - a linear predictive coder with robustness to packet losses , 2002, Speech Coding, 2002, IEEE Workshop Proceedings..

[30] Per Hedelin,et al. Packet loss concealment based on sinusoidal extrapolation , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[31] Søren Holdt Jensen,et al. Compressed domain packet loss concealment of sinusoidally coded speech , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[32] Per Hedelin,et al. Error protection and packet loss concealment based on a signal matched sinusoidal vocoder , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[33] Jonas Lindblom. Coding Speech for Packet Networks , 2003 .

[34] Bhaskar D. Rao,et al. PDF optimized parametric vector quantization of speech line spectral frequencies , 2003, IEEE Trans. Speech Audio Process..

[35] Per Hedelin,et al. Variable-dimension quantization of sinusoidal amplitudes using Gaussian mixture models , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[36] METHODS FOR SUBJECTIVE DETERMINATION OF TRANSMISSION QUALITY Summary , 2022 .