Machine-aided formant determination for speech synthesis.

HIS paper describes a method for speech analysis that is an outgrowth of an attempt to do speech synthesis by rule using a terminal analog synthesizer. In speech synthesis by rule, the rules accept an input string of phonemes and, based on the input string, generate control parameters that can then be used to control a speech synthesizer. The synthetic speech produced by the synthesizer can be subjected to various measures to determine its validity, but the ear must be the final criterion. However, the ear does not indicate in an explicit way wherein the control parameters may be improved. We feel, therefore, that being able to compare control data generated by rules with those extracted from the real speech of some talker can be a useful guide. Our approach is similar to that of Holmes, Mattingly, and Shearme, • who started with a set rules and then modified these rules, guided by spectrographic analysis and listening. However, a basic difference between our approach and that of Holmes et al. is that, in their case, one knows the general bounds of the control parameters and tries to write and modify rules that will generate satisfactory control parameters within these bounds; whereas in our case, one uses analysis to derive detailed control parameters and tries to write rules that will generate these parameters. (The latter scheme has the characteristic of being closely related to the particular speaker whose speech is analyzed.) To implement our scheme, it was necessary to perform some extensive