Wideband Extension of Narrowband Speech for Enhancement and Coding

ii Abstract Most existing telephone networks transmit narrowband coded speech which has been bandlimited to 4 kHz. Compared with normal speech, this speech has a muffled quality and reduced intelligibility, which is particularly noticeable in sounds such as /s/, /f/ and /sh/. Speech which has been bandlimited to 8 kHz is often coded for this reason, but this requires an increase in the bit rate. Wideband enhancement is a scheme that adds a synthesized highband signal to narrowband speech to produce a higher quality wideband speech signal. The synthesized highband signal is based entirely on information contained in the narrowband speech, and is thus achieved at zero increase in the bit rate from a coding perspective. Wideband enhancement can function as a post-processor to any narrowband telephone receiver, or alternatively it can be combined with any narrowband speech coder to produce a very low bit rate wideband speech coder. Applications include higher quality mobile, teleconferencing, and internet telephony. This thesis examines in detail each component of the wideband enhancement scheme: highband excitation synthesis, highband envelope estimation, and narrowband-highband envelope continuity. Objective and subjective test measures are formulated to assess existing and new methods for all components, and the likely limitations to the performance of wideband enhancement are also investigated. A new method for highband excitation synthesis is proposed that uses a combination of sinusoidal transform coding-based excitation and random excitation. Several new techniques for highband spectral envelope estimation are also developed. The performance of these techniques is shown to be approaching the limit likely to be achieved. Subjective tests demonstrate that wideband speech synthesized using these techniques has higher quality than the input narrowband speech. Finally, a new paradigm for very low bit rate wideband speech coding is presented in which the quality of the wideband enhancement scheme is improved further by allocating a very small bitstream for highband envelope and gain coding. Thus, this thesis demonstrates that wideband speech can be communicated at or near the bit rate of a narrowband speech coder.Most existing telephone networks transmit narrowband coded speech which has been bandlimited to 4 kHz. Compared with normal speech, this speech has a muffled quality and reduced intelligibility, which is particularly noticeable in sounds such as /s/, /f/ and /sh/. Speech which has been bandlimited to 8 kHz is often coded for this reason, but this requires an increase in the bit rate. Wideband enhancement is a scheme that adds a synthesized highband signal to narrowband speech to produce a higher quality wideband speech signal. The synthesized highband signal is based entirely on information contained in the narrowband speech, and is thus achieved at zero increase in the bit rate from a coding perspective. Wideband enhancement can function as a post-processor to any narrowband telephone receiver, or alternatively it can be combined with any narrowband speech coder to produce a very low bit rate wideband speech coder. Applications include higher quality mobile, teleconferencing, and internet telephony. This thesis examines in detail each component of the wideband enhancement scheme: highband excitation synthesis, highband envelope estimation, and narrowband-highband envelope continuity. Objective and subjective test measures are formulated to assess existing and new methods for all components, and the likely limitations to the performance of wideband enhancement are also investigated. A new method for highband excitation synthesis is proposed that uses a combination of sinusoidal transform coding-based excitation and random excitation. Several new techniques for highband spectral envelope estimation are also developed. The performance of these techniques is shown to be approaching the limit likely to be achieved. Subjective tests demonstrate that wideband speech synthesized using these techniques has higher quality than the input narrowband speech. Finally, a new paradigm for very low bit rate wideband speech coding is presented in which the quality of the wideband enhancement scheme is improved further by allocating a very small bitstream for highband envelope and gain coding. Thus, this thesis demonstrates that wideband speech can be communicated at or near the bit rate of a narrowband speech coder.

[1]  Allen Gersho,et al.  Variable dimension spectral coding of speech at 2400 bps and below with phonetic classification , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[2]  Kiyohiro Shikano,et al.  Speaker adaptation through vector quantization , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Biing-Hwang Juang,et al.  Optimal quantization of LSP parameters , 1993, IEEE Trans. Speech Audio Process..

[4]  Hermann Ney,et al.  Dynamic programming algorithm for optimal estimation of speech parameter contours , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[5]  D. O'Shaughnessy,et al.  Statistical signal mapping: a general tool for speech signal processing , 1992, [1992] IEEE Sixth SP Workshop on Statistical Signal and Array Processing.

[6]  C.-C. Jay Kuo,et al.  A new initialization technique for generalized Lloyd iteration , 1994, IEEE Signal Processing Letters.

[7]  Yair Shoham,et al.  Low-delay code-excited linear-predictive coding of wideband speech at 32 kbps , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[8]  James Durbin,et al.  The fitting of time series models , 1960 .

[9]  Jean-Christophe Valière,et al.  Low-band extension of telephone-band speech , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[10]  Manfred R. Schroeder,et al.  Code-excited linear prediction(CELP): High-quality speech at very low bit rates , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11]  Jerry D. Gibson,et al.  Variable-rate CELP based on subband flatness , 1997, IEEE Trans. Speech Audio Process..

[12]  J. W. Paulus,et al.  Variable Bitrate Wideband Speech Coding Using Perceptually Motivated Thresholds , 1995, Proceedings. IEEE Workshop on Speech Coding for Telecommunications.

[13]  R. Taori,et al.  Closed-loop tracking of sinusoids for speech and audio coding , 1999, 1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351).

[14]  Joachim Stegmann,et al.  Robust classification of speech based on the dyadic wavelet transform with application to CELP coding , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[15]  Douglas D. O'Shaughnessy,et al.  Statistical recovery of wideband speech from narrowband speech , 1992, IEEE Trans. Speech Audio Process..

[17]  S. Dimolitsas,et al.  Objective speech distortion measures and their relevance to speech quality assessments , 1989 .

[18]  Hynek Hermansky,et al.  Beyond NYQUIST: towards the recovery of broad-bandwidth speech from narrow-bandwidth speech , 1995, EUROSPEECH.

[19]  J. Makhoul,et al.  Vector quantization in speech coding , 1985, Proceedings of the IEEE.

[20]  W. Bastiaan Kleijn,et al.  On the mutual information between frequency bands in speech , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[21]  A. Rosenberg Effect of glottal pulse shape on the quality of natural vowels. , 1969 .

[22]  W. Voiers,et al.  Diagnostic acceptability measure for speech communication systems , 1977 .

[23]  John Makhoul,et al.  High-frequency regeneration in speech coding systems , 1979, ICASSP.

[24]  Kenneth Rose,et al.  A generalized VQ method for combined compression and estimation , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[25]  Anthony D. Fagan,et al.  Wideband speech coding in 7.2 kbit/s , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[26]  Petros Maragos,et al.  On amplitude and frequency demodulation using energy operators , 1993, IEEE Trans. Signal Process..

[27]  Hiroshi Yasukawa Restoration of wide band signal from telephone speech using linear prediction error processing , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[28]  B. Atal High-quality speech at low bit rates: Multi-pulse and stochastically excited linear predictive coders , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[29]  Richard M. Schwartz,et al.  A mixed-source model for speech compression and synthesis , 1978, ICASSP.

[30]  R. Gray,et al.  Distortion measures for speech processing , 1980 .

[31]  W. J. Holmes,et al.  Extension of the bandwidth of the JSRU parallel-formant synthesizer for high quality synthesis of male and female speech , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[32]  Willem Bastiaan Kleijn,et al.  Bandwidth expansion of speech based on vector quantization of the mel frequency cepstral coefficients , 1999, 1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351).

[33]  Douglas D. O'Shaughnessy,et al.  On the application of the AM-FM model for the recovery of missing frequency bands of telephone speech , 1998, ICSLP.

[34]  Yoshinori Sagisaka,et al.  Speech spectrum conversion based on speaker interpolation and multi-functional representation with weighting by radial basis function networks , 1995, Speech Commun..

[35]  Ronald W. Schafer,et al.  Enhancement of text images using a context based nonlinear interpolative vector quantization method , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[36]  Kuldip K. Paliwal,et al.  Efficient vector quantization of LPC parameters at 24 bits/frame , 1993, IEEE Trans. Speech Audio Process..

[37]  R. P. Cohn Robust voiced/unvoiced speech classification using a neural net , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[38]  H. Matsukoto,et al.  A piecewise linear spectral mapping for supervised speaker adaptation , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[39]  Thomas F. Quatieri,et al.  Pitch estimation and voicing detection based on a sinusoidal speech model , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[40]  A. Noll Cepstrum pitch determination. , 1967, The Journal of the Acoustical Society of America.

[41]  Wei Wang,et al.  An embedded sinusoidal transform codec with measured phases and sampling rate scalability , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[42]  Thomas F. Quatieri,et al.  Sine-wave phase coding at low data rates , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[43]  Soo Ngee Koh,et al.  Mixed excitation linear prediction coding of wideband speech at 8 kbps , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[44]  Carl W. Seymour,et al.  A low-bit-rate speech coder using adaptive line spectral frequency prediction 1319 , 1997, EUROSPEECH.

[45]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[46]  Allen Gersho,et al.  Robust closed-loop pitch estimation for harmonic coders by time scale modification , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[47]  John Makhoul,et al.  Spectral linear prediction: Properties and applications , 1975 .

[48]  Daniele Sereno,et al.  Some experiments of 7 kHz audio coding at 16 kbit/s , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[49]  Hiroshi Yasukawa Spectrum Broadening of Telephone Band Signals Using Multirate Processing for Speech Quality Enhancement , 1995 .

[50]  Anthony D. Fagan,et al.  Wideband speech coding using multiple codebooks and glottal pulses , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[51]  W. Bastiaan Kleijn,et al.  A speech coder based on decomposition of characteristic waveforms , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[52]  David A. Heide,et al.  Speech enhancement for bandlimited speech , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[53]  Kenneth Rose,et al.  A global optimization technique for statistical classifier design , 1996, IEEE Trans. Signal Process..

[54]  F. Itakura,et al.  Minimum prediction residual principle applied to speech recognition , 1975 .

[55]  Per Hedelin A tone oriented voice excited vocoder , 1981, ICASSP.

[56]  Allen Gersho,et al.  Vector quantization and signal compression , 1991, The Kluwer international series in engineering and computer science.

[57]  Yoshihisa Nakatoh,et al.  Generation of broadband speech from narrowband speech using piecewise linear mapping , 1997, EUROSPEECH.

[58]  R. J. McAulay,et al.  Computationally efficient sine-wave synthesis and its application to sinusoidal transform coding , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[59]  J. A. Naylor A neural network algorithm for enhancing delta modulation/LPC tandem connections , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[60]  Geoffrey C. Fox,et al.  Vector quantization by deterministic annealing , 1992, IEEE Trans. Inf. Theory.

[61]  Joseph P. Campbell,et al.  Voiced/Unvoiced classification of speech with applications to the U.S. government LPC-10E algorithm , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[62]  Kimihito TANAKA,et al.  A new fundamental frequency modification algorithm with transformation of spectrum envelope according to F/sub 0/ , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[63]  P. Mabilleau,et al.  16 kbps wideband speech coding technique based on algebraic CELP , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[64]  Thomas F. Quatieri,et al.  Speech analysis/Synthesis based on a sinusoidal representation , 1986, IEEE Trans. Acoust. Speech Signal Process..

[65]  Yoshinori Sagisaka,et al.  Acoustic characteristics of speaker individuality: Control and conversion , 1995, Speech Commun..

[66]  Robert F. Kubichek,et al.  Standards and technology issues in objective voice quality assessment , 1991, Digit. Signal Process..

[67]  A. El-Jaroudi,et al.  A fast neural net training algorithm and its application to voiced-unvoiced-silence classification of speech , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[68]  Kazunori Ozawa,et al.  A bitrate and bandwidth scalable CELP coder , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[69]  Cheung-Fat Chan,et al.  Wideband enhancement of narrowband coded speech using MBE re-synthesis , 1996, Proceedings of Third International Conference on Signal Processing (ICSP'96).

[70]  Richard M. Schwartz,et al.  Enhancement of speech corrupted by acoustic noise , 1979, ICASSP.

[71]  W. Equitz Fast algorithms for vector quantization picture coding , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[72]  Ronald W. Schafer,et al.  A generalized interpolative VQ method for jointly optimal quantization and interpolation of images , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[73]  Yair Shoham,et al.  Coding of wideband speech , 1991, Speech Commun..

[74]  R. McAulay,et al.  "Multirate sinusoidal transform coding at rates from 2.4 kbps to 8 kbps" , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[75]  Fabrice Plante,et al.  Phase modelling of speech excitation for low bit-rate sinusoidal transform coding , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[76]  Roch Lefebvre,et al.  Bandwidth extension of narrowband speech for low bit-rate wideband coding , 2000, 2000 IEEE Workshop on Speech Coding. Proceedings. Meeting the Challenges of the New Millennium (Cat. No.00EX421).

[77]  Simon R. Oldfield,et al.  Detection and discrimination of spectral peaks and notches at 1 and 8 kHz. , 1989, The Journal of the Acoustical Society of America.

[78]  Jürgen W. Paulus,et al.  16 kbit/s wideband speech coding based on unequal subbands , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[79]  Masanobu Abe,et al.  An algorithm to reconstruct wideband speech from narrowband speech based on codebook mapping , 1994, ICSLP.

[80]  Michael W. Marcellin,et al.  Joint compression and restoration of images using wavelets and non-linear interpolative vector quantization , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[81]  B. Moore,et al.  Temporal window shape as a function of frequency and level. , 1989, The Journal of the Acoustical Society of America.

[82]  H. Yasukawa Implementation of frequency domain digital filter for speech enhancement , 1996, Proceedings of Third International Conference on Electronics, Circuits, and Systems.

[83]  Luís B. Almeida,et al.  Harmonic coding: A low bit-rate, good-quality speech coding technique , 1982, ICASSP.

[84]  B. Hunt,et al.  A vector quantizer for image restoration. , 1998, IEEE transactions on image processing : a publication of the IEEE Signal Processing Society.

[85]  Lawrence R. Rabiner,et al.  A pattern recognition approach to voiced-unvoiced-silence classification with applications to speech recognition , 1976 .

[86]  J. Markel,et al.  The SIFT algorithm for fundamental frequency estimation , 1972 .

[87]  Jae S. Lim,et al.  Multiband excitation vocoder , 1988, IEEE Transactions on Acoustics, Speech, and Signal Processing.

[88]  Soo-Ngee Koh,et al.  A modified generalised Lloyd algorithm for VQ codebook design , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[89]  Douglas D. O'Shaughnessy,et al.  The masking of narrowband noise by broadband harmonic complex sounds and implications for the processing of speech sounds , 1994, Speech Commun..

[90]  Arild Fuldseth,et al.  Wideband speech coding at 16 kbit/s for a videophone application , 1992, Speech Commun..

[91]  Ning Bi,et al.  Application of speech conversion to alaryngeal speech enhancement , 1997, IEEE Trans. Speech Audio Process..

[92]  Yannis Stylianou,et al.  Quantization of the spectral envelope for sinusoidal coders , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[93]  B. Moore An Introduction to the Psychology of Hearing , 1977 .

[94]  Allen Gersho,et al.  Variable rate speech coding with phonetic segmentation , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[95]  Andreas Johannes Gerrits,et al.  Hi-BIN: an alternative approach to wideband speech coding , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[96]  Yariv Ephraim,et al.  A minimum mean square error approach for speech enhancement , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[97]  Eric Moulines,et al.  Voice transformation using PSOLA technique , 1991, Speech Commun..

[98]  Satoshi Nakamura,et al.  Speaker adaptation applied to HMM and neural networks , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[99]  Damith J. Mudugamuwa,et al.  Optimal transform for segmented parametric speech coding , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[100]  Hyung Soon Kim,et al.  Narrowband to wideband conversion of speech using GMM based transformation , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[101]  Allen Gersho,et al.  A 16-kbit/s bandwidth scalable audio coder based on the G.729 standard , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).