Multiband excitation vocoder

A speech model, referred to as the multiband excitation model, is presented. In this model the band around each harmonic of the fundamental frequency is declared voiced or unvoiced. Estimation methods for the parameters of the model are developed and methods to synthesize speech from the model parameters are described. To illustrate a potential application of the speech model, an 8 kb/s vocoder is developed and its performance is evaluated. Both informal listening and intelligibility tests show that the vocoder has very good performance both in speech quality and intelligibility, particularly for noisy speech.<<ETX>>

[1]  Ronald W. Schafer,et al.  Digital Processing of Speech Signals , 1978 .

[2]  Phil Clendeninn The Vocoder , 1940, Nature.

[3]  A. Noll Cepstrum pitch determination. , 1967, The Journal of the Acoustical Society of America.

[4]  Joel Max,et al.  Quantizing for minimum distortion , 1960, IRE Trans. Inf. Theory.

[5]  Jae Lim,et al.  Signal estimation from modified short-time Fourier transform , 1984 .

[6]  Lawrence R. Rabiner,et al.  Connected digit recognition using a level-building DTW algorithm , 1981 .

[7]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[8]  M. Ross,et al.  Average magnitude difference function pitch extractor , 1974 .

[9]  R. McAulay,et al.  Mid-rate coding based on a sinusoidal representation of speech , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10]  G. Fairbanks Test of Phonemic Differentiation: The Rhyme Test , 1958 .

[11]  A. Oppenheim Speech analysis-synthesis system based on homomorphic filtering. , 1969, The Journal of the Acoustical Society of America.

[12]  T.H. Crystal,et al.  Linear prediction of speech , 1977, Proceedings of the IEEE.

[13]  K. D. Kryter,et al.  ARTICULATION-TESTING METHODS: CONSONANTAL DIFFERENTIATION WITH A CLOSED-RESPONSE SET. , 1965, The Journal of the Acoustical Society of America.

[14]  M. Richards Helium speech enhancement using the short-time Fourier transform , 1982 .

[15]  Jae Lim,et al.  Signal reconstruction from short-time Fourier transform magnitude , 1983 .

[16]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[17]  Jae S. Lim,et al.  A new model-based speech analysis/Synthesis system , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[18]  T. Teichmann,et al.  The Measurement of Power Spectra , 1960 .

[19]  T. Parks,et al.  Maximum likelihood pitch estimation , 1976, 1977 IEEE Conference on Decision and Control including the 16th Symposium on Adaptive Processes and A Special Symposium on Fuzzy Set Theory and Applications.

[20]  James Holmes,et al.  The JSRU channel vocoder , 1980 .

[21]  D. Griffin,et al.  A high quality 9.6 kbps speech coding system , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[22]  O. Fujimura An approximation to voice aperiodicity , 1968 .

[23]  Bishnu S. Atal,et al.  A new model of LPC excitation for producing natural-sounding speech at low bit rates , 1982, ICASSP.

[24]  H. Dudley Thirty Years of Vocoder Research , 1964 .

[25]  B Gold,et al.  Parallel processing techniques for estimating pitch periods of speech in the time domain. , 1969, The Journal of the Acoustical Society of America.

[26]  Soon-young Kwon,et al.  An enhanced LPC vocoder with no voiced/Unvoiced switch , 1984 .

[27]  B Gold,et al.  Vocoder Analysis Based on Properties of the Human Auditory System. , 1983 .

[28]  Richard M. Schwartz,et al.  A mixed-source model for speech compression and synthesis , 1978, ICASSP.

[29]  L. Rabiner,et al.  Effects of smoothing and quantizing the parameters of formant-coded voiced speech. , 1971, The Journal of the Acoustical Society of America.