There has been considerable interest in the development of low bit rate, high quality speech analysis/synthesis systems. Applications for such systems include voice mail, low bit rate digital communications, and high security telephony. One class of speech analysis/synthesis systems (vocoders) which has been studied extensively and used widely in practice is based on an underlying model of speech. For this class, segments of speech are represented as the product of excitation and system spectra. The excitation parameters generally consist of a pitch period and a voiced/unvoiced (V/UV) decision. The system parameters are typically the spectral envelope or impulse response of the vocal tract. Speech is generated in the vocoder by exciting the system with a periodic impulse train in the case of voiced speech or random noise in the case of unvoiced speech. While vocoders of this type are capable of producing intelligible speech, they have not been successful in synthesizing high quality speech. In addition, the performance of these vocoders is known to degrade rapidly in the presence of background noise. Considerable attention has been devoted to improving these systems. These improvements have focused primarily on the specification and quantization of the excitation signal after removal of the pitch structure. While these techniques have improved the quality, they have significantly increased algorithm complexity, which has precluded the real-time implementation of these systems on low cost architectures.
[1]
Jae S. Lim,et al.
A real-time implementation of the improved MBE speech coder
,
1990,
International Conference on Acoustics, Speech, and Signal Processing.
[2]
D. Griffin,et al.
A high quality 9.6 kbps speech coding system
,
1986,
ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.
[3]
Jae S. Lim,et al.
Multiband excitation vocoder
,
1988,
IEEE Transactions on Acoustics, Speech, and Signal Processing.
[4]
B Gold,et al.
Vocoder Analysis Based on Properties of the Human Auditory System.
,
1983
.
[5]
Jae Lim,et al.
Signal estimation from modified short-time Fourier transform
,
1984
.
[6]
John C. Hardwick,et al.
A 4.8 kbps multi-band excitation speech coder
,
1988,
ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.
[7]
Jae S. Lim,et al.
A new model-based speech analysis/Synthesis system
,
1985,
ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.