Text-to-speech synthesis integrated circuit

Geveze software is one of many implementations in text-to-speech synthesis for various languages. The program is based on vocal tract modeling and compresses speech by the LPC method. During synthesis, for each letter of a given word, the nearest combination of the letter sequences within the words used in training is searched and its parameters are used. As in other systems based on vocal tract modeling, a pulse train generates excitation for voiced sounds, while a noise signal is used for unvoiced sounds. The obtained signal is then amplified with a coefficient special to the sound at that instant and finally sent to an IIR filter, whose filter characteristics are determined by LPC coefficients, and the digitized waveform of the speech is obtained. During training, 10 LPC coefficients, 1 gain, and 1 period information bit are obtained for each 25 ms window, separated by 10 ms. During synthesis, these values change every 10 ms to the values of the following window. The digital signal at the output of the IIR filter is converted to analog, which has to be passed through a low pass filter (LPF) in order to smooth the transitions between windows. After filtering, the analog signal is ready to be amplified. Our objective is to design this system, already running on computer, as an integrated circuit and, if possible, to have a single chip solution with optimum cost and performance.

[1]  Geoff Bristow,et al.  Electronic Speech Synthesis , 1984 .

[2]  P. Holzmann,et al.  A single-chip text-to-speech synthesis device utilizing analog nonvolatile multilevel flash storage , 2002, IEEE Journal of Solid-State Circuits.

[3]  Wen-Kuei Chen,et al.  A single-chip text-to-speech synthesis device utilizing analog non-volatile multi-level flash storage , 2002, 2002 IEEE International Solid-State Circuits Conference. Digest of Technical Papers (Cat. No.02CH37315).