Codage de la parole a bas et tres bas debits

RésuméCet article présente les principales techniques de codage de parole à bas et très bas debits, de 50 bit/s à 4 000 bit/s. Puis il présente en détail la méthode hsx pour le codage à 1200 bit/s et une nouvelle approche segmentale utilisant des unités acoustiques obtenues de manière non supervisée pour des débits inférieurs à 400 bit/s.AbstractThis paper reviews the main algorithms for speech coding at low and very low bit rates, from 50 bps to 4 000 bps. Then the hsx technique for coding at 1200 bps and a new segmental method with automatically derived units for very low bit rate coding are presented in details.

[1]  B. Atal,et al.  Speech analysis and synthesis by linear prediction of the speech wave. , 1971, The Journal of the Acoustical Society of America.

[2]  J. Flanagan Speech Analysis, Synthesis and Perception , 1971 .

[3]  Robert M. Gray,et al.  An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..

[4]  S. Roucos,et al.  Segment quantization for very-low-rate speech coding , 1982, ICASSP.

[5]  Richard M. Schwartz,et al.  A comparison of methods for 300-400 b/s vocoders , 1983, ICASSP.

[6]  Bishnu S. Atal,et al.  Efficient coding of LPC parameters by temporal decomposition , 1983, ICASSP.

[7]  D. Wong,et al.  Very low data rate speech compression with LPC vector and matrix quantization , 1983, ICASSP.

[8]  Richard M. Schwartz,et al.  A segment vocoder at 150 b/s , 1983, ICASSP.

[9]  Manfred R. Schroeder,et al.  Code-excited linear prediction(CELP): High-quality speech at very low bit rates , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10]  Thomas P. Barnwell,et al.  A low bit rate segment vocoder based on line spectrum pairs , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11]  A. Wilgus,et al.  The waveform segment vocoder: A new approach for very-low-rate speech coding , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12]  L. Fransen,et al.  Application of line-spectrum pairs to low-bit-rate speech encoders , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13]  J. Rothweiler Performance of a real time low rate voice codec , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[14]  Thomas F. Quatieri,et al.  Speech analysis/Synthesis based on a sinusoidal representation , 1986, IEEE Trans. Acoust. Speech Signal Process..

[15]  R. McAulay,et al.  "Multirate sinusoidal transform coding at rates from 2.4 kbps to 8 kbps" , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[16]  Masaaki Honda,et al.  LPC speech coding based on variable-length segment quantization , 1988, IEEE Trans. Acoust. Speech Signal Process..

[17]  Jae S. Lim,et al.  Multiband excitation vocoder , 1988, IEEE Transactions on Acoustics, Speech, and Signal Processing.

[18]  George R. Doddington,et al.  A phonetic vocoder , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[19]  J. Rothweiler,et al.  A high quality speech coder at 400 bps , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[20]  P. Peterson,et al.  Improving intelligibility of a 300 b/s segment vocoder , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[21]  D. O'Shaughnessy,et al.  A 450 b.p.s. vocoder with natural-sounding speech , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[22]  R. J. McAulay,et al.  Improved interoperable 2.4 kb/s LPC using sinusoidal transform coder techniques , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[23]  Thomas F. Quatieri,et al.  Sine-wave phase coding at low data rates , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[24]  P. Peterson,et al.  Segment vocoder based on reconstruction with natural segments , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[25]  Frédéric Bimbot,et al.  An evaluation of temporal decomposition , 1991, EUROSPEECH.

[26]  T. E. Tremain,et al.  Multi-frame coding of LPC parameters at 600-800 bps , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[27]  Allen Gersho,et al.  Vector quantization and signal compression , 1991, The Kluwer international series in engineering and computer science.

[28]  C. Jaskie,et al.  A 600 bps LPC voice coder , 1991, MILCOM 91 - Conference record.

[29]  Nariman Farvardin,et al.  A combined quantization-interpolation scheme for very low bit rate coding of speech LSP parameters , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[30]  W. Bastiaan Kleijn,et al.  Encoding speech using prototype waveforms , 1993, IEEE Trans. Speech Audio Process..

[31]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[32]  Allen Gersho,et al.  Advances in speech and audio compression , 1994, Proc. IEEE.

[33]  A. Gersho Advances in speech and audio compression : Data compression , 1994 .

[34]  Andreas Spanias,et al.  Speech coding: a tutorial review , 1994, Proc. IEEE.

[35]  Philip A. Chou,et al.  Variable dimension vector quantization of linear predictive coefficients of speech , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[36]  Gérard Chollet,et al.  Excitation Construction for the Robust Low Bit Rate CELP Speech Coder , 1995 .

[37]  B. Mouy,et al.  NATO STANAG 4479: a standard for an 800 bps vocoder and channel coding in HF-ECCM system , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[38]  W. Bastiaan Kleijn,et al.  A speech coder based on decomposition of characteristic waveforms , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[39]  Antonio José Rubio Ayuso,et al.  Speech Recognition and Coding: New Advances and Trends , 1995 .

[40]  Steve Young,et al.  The HTK book , 1995 .

[41]  Stefan Bruhn Matrix product quantization for very-low-rate speech coding , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[42]  Thomas P. Barnwell,et al.  A 2.4 kbit/s MELP coder candidate for the new U.S. Federal Standard , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[43]  J.-P. Adoul,et al.  Harmonic-stochastic excitation (HSX) speech coding below 4 kbit/s , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[44]  John S. Collura,et al.  MELP: the new Federal Standard at 2400 bps , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[45]  Thierry Dutoit,et al.  Diphone concatenation using a harmonic plus noise model of speech , 1997, EUROSPEECH.

[46]  Isabel Trancoso,et al.  Phonetic vocoding with speaker adaptation , 1997, EUROSPEECH.

[47]  Yair Shoham Very low complexity interpolative speech coding at 1.2 to 2.4 kbps , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[48]  Gérard Chollet,et al.  Quantization of spectral sequences using variable length spectral segments for speech coding at very low bit rate , 1997, EUROSPEECH.

[49]  Keiichi Tokuda,et al.  A very low bit rate speech coder using HMM-based speech recognition/synthesis techniques , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[50]  Philippe Gournay,et al.  A 1200 bits/s HSX speech coder for very-low-bit-rate communications , 1998, 1998 IEEE Workshop on Signal Processing Systems. SIPS 98. Design and Implementation (Cat. No.98TH8374).

[51]  Gérard Chollet,et al.  Segmental vocoder-going beyond the phonetic approach , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[52]  K. M. Ponting,et al.  Computational Models of Speech Pattern Processing , 1999, NATO ASI Series.

[53]  Guillaume Gravier,et al.  Towards Fully Automatic Speech Processing Techniques for Interactive Voice Servers , 1999 .

[54]  Masayuki Nishiguchi,et al.  Parametric speech coding-HVXC at 2.0-4.0 kbps , 1999, 1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351).

[55]  Philippe Gournay,et al.  Study of the influence of noise pre-processing on the performance of a low bit rate parametric speech coder , 1999, EUROSPEECH.

[56]  Gérard Chollet,et al.  Very Low Bit Rate Speech Coding: Comparison of Data-Driven Units with Syllable Segments , 1999, TSD.