Low-Bit-Rate Speech Coding

Low-bit-rate speech coding, at rates below 4 kb/s, is needed for both communication and voice storage applications. At such low rates, full encoding of the speech waveform is not possible; therefore, low-rate coders rely instead on parametric models to represent only the most perceptually relevant aspects of speech. While there are a number of different approaches for this modeling, all can be related to the basic linear model of speech production, where an excitation signal drives a vocal-tract filter.

[1]  Allen Gersho,et al.  A 1200/2400 bps coding suite based on MELP , 2002, Speech Coding, 2002, IEEE Workshop Proceedings..

[2]  T. E. Tremain,et al.  Perceptually Based Distortion Measurements For Spectrum Quantization , 1995, Proceedings. IEEE Workshop on Speech Coding for Telecommunications.

[3]  François Capman,et al.  New Nato Stanag Narrow Band Voice Coder at 600 Bits/s , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[4]  Allen Gersho,et al.  Enhanced waveform interpolative coding at low bit-rate , 2001, IEEE Trans. Speech Audio Process..

[5]  Allen Gersho,et al.  Real-time vector APC speech coding at 4800 bps with adaptive postfiltering , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  M.G. Bellanger,et al.  Digital processing of speech signals , 1980, Proceedings of the IEEE.

[7]  Thomas P. Barnwell,et al.  A 2.4 kbit/s MELP coder candidate for the new U.S. Federal Standard , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[8]  Per Hedelin A tone oriented voice excited vocoder , 1981, ICASSP.

[9]  Thomas P. Barnwell,et al.  Improving the performance of a mixed excitation LPC vocoder in acoustic noise , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10]  Thomas P. Barnwell,et al.  Improving the 2.4 kb/s military standard-MELP (MS-MELP) coder using pitch-synchronous analysis and synthesis techniques [speech coding] , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[11]  Allen Gersho,et al.  Vector quantization and signal compression , 1991, The Kluwer international series in engineering and computer science.

[12]  David Y. Wong On understanding the quality problems of LPC speech , 1980, ICASSP.

[13]  J. Makhoul,et al.  Linear prediction: A tutorial review , 1975, Proceedings of the IEEE.

[14]  W. Bastiaan Kleijn,et al.  Fast methods for the CELP speech coding algorithm , 1990, IEEE Trans. Acoust. Speech Signal Process..

[15]  A. McCree,et al.  A 1.6 kb/s MELP coder for wireless communications , 1997, 1997 IEEE Workshop on Speech Coding for Telecommunications Proceedings. Back to Basics: Attacking Fundamental Problems in Speech Coding.

[16]  A. Rosenberg Effect of glottal pulse shape on the quality of natural vowels. , 1969, The Journal of the Acoustical Society of America.

[17]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[18]  Yang Gao,et al.  A candidate for the ITU-T 4 kbit/s speech coding standard , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[19]  O. Fujimura An approximation to voice aperiodicity , 1968 .

[20]  Mark J. T. Smith,et al.  Speech analysis/synthesis and modification using an analysis-by-synthesis/overlap-add sinusoidal model , 1997, IEEE Trans. Speech Audio Process..

[21]  Allen Gersho,et al.  Speech coding with an analysis-by-synthesis sinusoidal model , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[22]  George R. Doddington,et al.  A phonetic vocoder , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[23]  Thomas F. Quatieri,et al.  Speech analysis/Synthesis based on a sinusoidal representation , 1986, IEEE Trans. Acoust. Speech Signal Process..

[24]  John S. Collura,et al.  MELP: the new Federal Standard at 2400 bps , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[25]  D. Prezas,et al.  Selective modeling of the LPC residual during unvoiced frames: White noise or pulse excitation , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[26]  Spiros Dimolitsas,et al.  Performance assessment of 4.8 kbit/s AMBE coding under aeronautical environmental conditions , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[27]  Ahmet M. Kondoz,et al.  High quality multiband LPC coding of speech at 2.4 kbit/s , 1991 .

[28]  Allen Gersho,et al.  Combined harmonic and waveform coding of speech at low bit rates , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[29]  Wolfgang Hess,et al.  Pitch Determination of Speech Signals , 1983 .

[30]  Vladimir Cuperman,et al.  Nonsquare transform vector quantization , 1996, IEEE Signal Processing Letters.

[31]  J. V. Evans,et al.  Satellite systems for personal communications , 1998, Proc. IEEE.

[32]  W. S. Rhode,et al.  Responses of Cochlear Nucleus Neurons to Speech Signals: Neural Encoding of Pitch, Intensity and other Parameters , 1986 .

[33]  H. Dudley Phonetic Pattern Recognition Vocoder for Narrow‐Band Speech Transmission , 1958 .

[34]  Alan McCree,et al.  High quality MELP coding at bit-rates around 4 kb/s , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[35]  Bishnu S. Atal,et al.  On synthesizing natural-sounding speech by linear prediction , 1979, ICASSP.

[36]  R.V. Cox,et al.  An intelligibility enhancement for the mixed excitation linear prediction speech coder , 2003, IEEE Signal Processing Letters.

[37]  Eddie L. T. Choy,et al.  Waveform Interpolation Speech Coder at 4 kb/s , 1998 .

[38]  Isabel Trancoso,et al.  A study on the realtionships between stochastic and harmonic coding , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[39]  John N. Holmes,et al.  Formant excitation before and after glottal closure , 1976, ICASSP.

[40]  Thomas Quatieri,et al.  Discrete-Time Speech Signal Processing: Principles and Practice , 2001 .

[41]  M. Sabin,et al.  Sine-Wave Amplitude Coding at Low Data Rates , 1991 .

[42]  R. Martin,et al.  New speech enhancement techniques for low bit rate speech coding , 1999, 1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351).

[43]  Thomas P. Barnwell,et al.  MCCREE AND BARNWELL MIXED EXCITAmON LPC VOCODER MODEL LPC SYNTHESIS FILTER 243 SYNTHESIZED SPEECH-PERIODIC PULSE TRAIN-1 PERIODIC POSITION JITTER PULSE 4 , 2004 .

[44]  Kuldip K. Paliwal,et al.  Efficient vector quantization of LPC parameters at 24 bits/frame , 1993, IEEE Trans. Speech Audio Process..

[45]  W. Bastiaan Kleijn,et al.  Encoding speech using prototype waveforms , 1993, IEEE Trans. Speech Audio Process..

[46]  Allen Gersho,et al.  Variable-dimension vector quantization , 1996, IEEE Signal Process. Lett..

[47]  F. Itakura Line spectrum representation of linear predictor coefficients of speech signals , 1975 .

[48]  D. J. Rahikka,et al.  The 1.2 kbps/2.4 kbps MELP speech coding suite with integrated noise pre-processing , 1999, MILCOM 1999. IEEE Military Communications. Conference Proceedings (Cat. No.99CH36341).

[49]  Vladimir Cuperman,et al.  Analysis-by-synthesis multimode harmonic speech coding at 4 kb/s , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[50]  Jae S. Lim,et al.  Multiband excitation vocoder , 1988, IEEE Transactions on Acoustics, Speech, and Signal Processing.

[51]  Juan Carlos De Martin,et al.  A 1.7 kb/s MELP coder with improved analysis and quantization , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[52]  B. B. Bauer,et al.  Fundamentals of acoustics , 1963 .

[53]  Soon-young Kwon,et al.  An enhanced LPC vocoder with no voiced/Unvoiced switch , 1984 .

[54]  Mary A. Kohler,et al.  Philosophy and goals of the DoD 2400 bps vocoder selection process , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[55]  Alan McCree,et al.  A 4 kb/s hybrid MELP/CELP speech coding candidate for ITU standardization , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[56]  Allen Gersho,et al.  A 1200 bps speech coder based on MELP , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[57]  Luís B. Almeida,et al.  Variable-frequency synthesis: An improved harmonic coding scheme , 1984, ICASSP.

[58]  Richard M. Schwartz,et al.  A mixed-source model for speech compression and synthesis , 1978, ICASSP.

[59]  Samy A. Mahmoud,et al.  Efficient search and design procedures for robust multi-stage VQ of LPC parameters for 4 kb/s speech coding , 1993, IEEE Trans. Speech Audio Process..

[60]  Bhaskar D. Rao,et al.  Theoretical analysis of the high-rate vector quantization of LPC parameters , 1995, IEEE Trans. Speech Audio Process..

[61]  P. Kroon,et al.  Generalized analysis-by-synthesis coding and its application to pitch prediction , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[62]  Ira Alan Gerson,et al.  Vector Sum Excited Linear Prediction (VSELP) , 1991 .

[63]  W.B. Kleijn,et al.  Transformation and decomposition of the speech signal for coding , 1994, IEEE Signal Processing Letters.

[64]  Thomas Eriksson,et al.  On waveform-interpolation coding with asymptotically perfect reconstruction , 1999, 1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351).

[65]  Aaron E. Rosenberg,et al.  On reducing the buzz in LPC synthesis , 1977 .

[66]  Jun Matsumoto,et al.  Vector quantized MBE with simplified V/UV division at 3.0 kbit/s , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[67]  Mary A. Kohler A comparison of the new 2400 bps MELP Federal Standard with other standard coders , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[68]  L. H. Anauer,et al.  Speech Analysis and Synthesis by Linear Prediction of the Speech Wave , 2000 .

[69]  Joe F. Chicharo,et al.  A new waveform interpolation coding scheme based on pitch synchronous wavelet transform decomposition , 2000, IEEE Trans. Speech Audio Process..

[70]  A.M. Kondoz,et al.  A 2.4/1.2 kbps SB-LPC based speech coder: the Turkish NATO STANAG candidate , 2002, Speech Coding, 2002, IEEE Workshop Proceedings..

[71]  H. Saunders,et al.  Fundamentals of Acoustics (3rd Ed.) , 1983 .

[72]  George S. Kang,et al.  Improvement of the excitation source in the narrow-band linear prediction vocoder , 1985, IEEE Trans. Acoust. Speech Signal Process..

[73]  S. Dimolitsas,et al.  Current objectives in 4-kb/s wireline-quality speech coding standardization , 1994, IEEE Signal Processing Letters.

[74]  Thomas P. Barnwell,et al.  A new mixed excitation LPC vocoder , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[75]  Manfred R. Schroeder,et al.  Code-excited linear prediction(CELP): High-quality speech at very low bit rates , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[76]  Soo Ngee Koh,et al.  Mixed excitation linear prediction coding of wideband speech at 8 kbps , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[77]  J. Holmes,et al.  The influence of glottal waveform on the naturalness of speech from a parallel formant synthesizer , 1973 .

[78]  J. C. Hardwick,et al.  The application of the IMBE speech coder to mobile communications , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[79]  Joseph P. Campbell,et al.  The Dod 4.8 Kbps Standard (Proposed Federal Standard 1016) , 1991 .

[80]  D H Klatt,et al.  Review of text-to-speech conversion for English. , 1987, The Journal of the Acoustical Society of America.

[81]  Thomas P. Barnwell,et al.  An improved mixed excitation linear prediction (MELP) coder , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[82]  Suat Yeldener A 4 kb/s toll quality harmonic excitation linear predictive speech coder , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[83]  Juan Carlos,et al.  Review of "Discrete-Time Speech Signal Processing - Principles and Practice", by Thomas Quatieri, Prentice-Hall, 2001 , 2003 .

[84]  Philippe Gournay,et al.  Description of the French NATO candidate , 2002, Speech Coding, 2002, IEEE Workshop Proceedings..

[85]  Michael Shapiro Brandstein A 1.5 Kbps multi-band excitation speech coder , 1990 .