Speech coding for mobile and multimedia applications

Although speech coding has been an ongoing area of research for several decades, the recent advances in real-time DSP and the emergence of new applications have spurred a renewed interest in the area. Several speech coding algorithms have been adopted in international standards and study groups are drafting new standards for existing and emerging mobile and multimedia applications. In this paper, we provide a survey of speech coding technologies with emphasis on those methods that are part of recent communication standards. The paper starts with an introduction to speech coding and continues with descriptions of linear predictive vocoders, analysis-by-synthesis linear prediction, sub-band and transform coders, and sinusoidal analysis-synthesis systems. We conclude with a summary of this critical review paper along with a brief discussion on opportunities for future research.

[1]  Costas S. Xydeas,et al.  A long history quantization approach to scalar and vector quantization of LSP coefficients , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Richard Jamss Pinnell Adaptive Transform Coding of Speech Signals , 1982 .

[3]  J. C. Hardwick,et al.  The application of the IMBE speech coder to mobile communications , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[4]  Thomas P. Barnwell,et al.  Quality comparison of low complexity 4800 bps self excited and code excited vocoders , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  B. Atal,et al.  Predictive coding of speech signals and subjective error criteria , 1979 .

[6]  Allen Gersho,et al.  Real-time vector APC speech coding at 4800 bps with adaptive postfiltering , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  A. Spanias,et al.  Accurate representation of time-varying signals using mixed transforms with applications to speech , 1989 .

[8]  Robert F. Kubichek,et al.  Standards and technology issues in objective voice quality assessment , 1991, Digit. Signal Process..

[9]  W. Bastiaan Kleijn,et al.  Methods for waveform interpolation in speech coding , 1991, Digit. Signal Process..

[10]  D. Paul The spectral envelope estimation vocoder , 1981 .

[11]  Chong Un,et al.  The Residual-Excited Linear Prediction Vocoder with Transmission Rate Below 9.6 kbits/s , 1975, IEEE Trans. Commun..

[12]  Andreas Spanias,et al.  Speech coding: a tutorial review , 1994, Proc. IEEE.

[13]  Kuldip K. Paliwal,et al.  Efficient vector quantization of LPC parameters at 24 bits/frame , 1990, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[14]  Andreas Spanias,et al.  Vector quantization of transform components for speech coding at 1200 bps , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[15]  R. McAulay,et al.  "Multirate sinusoidal transform coding at rates from 2.4 kbps to 8 kbps" , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[16]  Luís B. Almeida,et al.  Harmonic coding at 4.8 kb/s , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[17]  J. Gibson Adaptive prediction for speech encoding , 1984, IEEE ASSP Magazine.

[18]  Bishnu S. Atal,et al.  Improving performance of multi-pulse LPC coders at low bit rates , 1984, ICASSP.

[19]  W. Bastiaan Kleijn,et al.  Fast methods for the CELP speech coding algorithm , 1990, IEEE Trans. Acoust. Speech Signal Process..

[20]  Allen Gersho,et al.  Advances in speech and audio compression , 1994, Proc. IEEE.

[21]  A. Gersho,et al.  Improved excitation for phonetically-segmented VXC speech coding below 4 kb/s , 1990, [Proceedings] GLOBECOM '90: IEEE Global Telecommunications Conference and Exhibition.

[22]  K. R. Rao,et al.  Orthogonal Transforms for Digital Signal Processing , 1979, IEEE Transactions on Systems, Man and Cybernetics.

[23]  Andreas Spanias Block time and frequency domain modified covariance algorithms for spectral analysis , 1993, IEEE Trans. Signal Process..

[24]  Thomas F. Quatieri,et al.  Speech analysis/Synthesis based on a sinusoidal representation , 1986, IEEE Trans. Acoust. Speech Signal Process..

[25]  Thomas F. Quatieri,et al.  The application of subband coding to improve quality and robustness of the sinusoidal transform coder , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[26]  Robert M. Gray,et al.  Multimode coding: application to CELP , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[27]  P. Kroon,et al.  Generalized analysis-by-synthesis coding and its application to pitch prediction , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[28]  Ira Alan Gerson,et al.  Vector Sum Excited Linear Prediction (VSELP) , 1991 .

[29]  Luís B. Almeida,et al.  Variable-frequency synthesis: An improved harmonic coding scheme , 1984, ICASSP.

[30]  W. Daumer Subjective Evaluation of Several Efficient Speech Coders , 1982, IEEE Trans. Commun..

[31]  George S. Kang,et al.  Improvement of the excitation source in the narrow-band linear prediction vocoder , 1985, IEEE Trans. Acoust. Speech Signal Process..

[32]  Richard V. Cox,et al.  Spectral quantization and interpolation for CELP coders , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[33]  James Durbin,et al.  The fitting of time series models , 1960 .

[34]  S. Dimolitsas,et al.  Current objectives in 4-kb/s wireline-quality speech coding standardization , 1994, IEEE Signal Processing Letters.

[35]  Ira A. Garson Vector sum excited linear prediction (VSELP) speech coding for Japan digital cellular , 1990 .

[36]  Bishnu S. Atal,et al.  A new model of LPC excitation for producing natural-sounding speech at low bit rates , 1982, ICASSP.

[37]  S. Campanella,et al.  A Comparison of Orthogonal Transformations for Digital Speech Processing , 1971 .

[38]  John E. Markel,et al.  Linear Prediction of Speech , 1976, Communication and Cybernetics.

[39]  Thomas P. Barnwell,et al.  A low bit rate segment vocoder based on line spectrum pairs , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[40]  B. Atal,et al.  Optimizing digital speech coders by exploiting masking properties of the human ear , 1978 .

[41]  Per Hedelin A tone oriented voice excited vocoder , 1981, ICASSP.

[42]  J. Makhoul,et al.  Linear prediction: A tutorial review , 1975, Proceedings of the IEEE.

[43]  Peter Kabal,et al.  Pitch prediction filters in speech coding , 1989, IEEE Trans. Acoust. Speech Signal Process..

[44]  L. Fransen,et al.  Application of line-spectrum pairs to low-bit-rate speech encoders , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[45]  John S. Collura,et al.  Evaluation of low rate speech coders for HF , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[46]  Ronald E. Crochiere,et al.  Real-Time Speech Coding , 1982, IEEE Trans. Commun..

[47]  Andreas Spanias,et al.  Speech processing using higher order statistics , 1993, 1993 IEEE International Symposium on Circuits and Systems.

[48]  W. Voiers,et al.  Diagnostic acceptability measure for speech communication systems , 1977 .

[49]  Andreas Spanias A hybrid model for speech synthesis , 1990, IEEE International Symposium on Circuits and Systems.

[50]  M.J.T. Smith,et al.  Perceptual considerations in a low bit rate sinusoidal vocoder , 1990, Ninth Annual International Phoenix Conference on Computers and Communications. 1990 Conference Proceedings.

[51]  G. Fairbanks Test of Phonemic Differentiation: The Rhyme Test , 1958 .

[52]  Ed F. Deprettere,et al.  A class of analysis-by-synthesis predictive coders for high quality speech coding at rates between 4.8 and 16 kbit/s , 1988, IEEE J. Sel. Areas Commun..

[53]  R. Cox,et al.  Real-time simulation of adaptive transform coding , 1981 .

[54]  George S Kang,et al.  600-Bit-Per-Second Voice Digitizer (Linear Predictive Formant Vocoder). , 1976 .

[55]  Victor B. Lawrence,et al.  Coding of speech and wideband audio , 1990, AT&T Technical Journal.

[56]  P. C. Ching,et al.  Walsh-transform coding of the speech residual in RELP coders , 1984 .

[57]  D. Esteban,et al.  Application of quadrature mirror filters to split band voice coding schemes , 1977 .

[58]  B.S. Atal,et al.  Efficient search procedures for selecting the optimum innovation in stochastic coders , 1990, IEEE Trans. Acoust. Speech Signal Process..

[59]  Bishnu S. Atal Predictive Coding of Speech at Low Bit Rates , 1982, IEEE Trans. Commun..

[60]  Richard M. Schwartz,et al.  A mixed-source model for speech compression and synthesis , 1978, ICASSP.

[61]  M. Vetterli Filter banks allowing perfect reconstruction , 1986 .

[62]  Andreas Spanias A hybrid transform method for analysis/synthesis of speech , 1991, Signal Process..

[63]  Yair Shoham Constrained-stochastic excitation coding of speech at 4.8 kb/s , 1990, ICSLP.

[64]  Karl Hellwig,et al.  Speech codec for the European mobile radio system , 1989, IEEE Global Telecommunications Conference, 1989, and Exhibition. 'Communications Technology for the 1990s and Beyond.

[65]  G. S. Kang,et al.  High-Quality 800-b/s Voice Processing Algorithm. , 1991 .

[66]  Jae S. Lim,et al.  Multiband excitation vocoder , 1988, IEEE Transactions on Acoustics, Speech, and Signal Processing.

[67]  R. J. McAulay,et al.  The sinusoidal transform coder at 2400 b/s , 1992, MILCOM 92 Conference Record.

[68]  Peter Kabal,et al.  Stability and performance analysis of pitch filters in speech coders , 1987, IEEE Trans. Acoust. Speech Signal Process..

[69]  J. Makhoul,et al.  Quantization properties of transmission parameters in linear predictive systems , 1975 .

[70]  Andreas Spanias,et al.  Mixed Fourier/Walsh transform scheme for speech coding at 4.0 kbit/s , 1992 .

[71]  W. Bastiaan Kleijn,et al.  Source-Dependent Channel Coding and its Application to CELP , 1991 .

[72]  P. Noll,et al.  Approaches to adaptive transform speech coding at low bit rates , 1979 .

[73]  Nikil Jayant,et al.  Signal Compression: Technology Targets and Research Directions , 1992, IEEE J. Sel. Areas Commun..

[74]  James L. Flanagan,et al.  Digital coding of speech in sub-bands , 1976, The Bell System Technical Journal.

[75]  Peter Kabal,et al.  The computation of line spectral frequencies using Chebyshev polynomials , 1986, IEEE Trans. Acoust. Speech Signal Process..

[76]  J. L. Flanagan,et al.  Parametric coding of speech spectra , 1980 .

[77]  Yen-Chun Lin,et al.  A Low-Delay CELP Coder for the CCITT 16 kb/s Speech Coding Standard , 1992, IEEE J. Sel. Areas Commun..

[78]  C. Weinstein Opportunities for advanced speech processing in military computer-based systems , 1990 .

[79]  Jae S. Lim,et al.  A real-time implementation of the improved MBE speech coder , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[80]  Myung Hoon Sunwoo,et al.  A Real-time Implementation Of Key VSELP Routines on a 16-bit DSP Chip , 1991, 1991 IEEE International Conference on Consumer Electronics.

[81]  Panos E. Papamichalis,et al.  Practical approaches to speech coding , 1987 .