A multi-lingual system for the determination of phonetic word stress using soft feature selection by neural networks

Any TTS system requires a routine to determine the transcription of out of vocabulary (OOV) words. This transcription contains three information: the phoneme sequence, the position of syllable boundaries and the position of word stress. In the TTS system ”Papageno”, the phonemes and syllable boundaries are determined by a neural network proposed in [1]. In the same paper also a second network for word stress determination was proposed. A similar architecture is used here, enhanced by a diagonal matrix between the input and the hidden layer penalised by weight decay. Weight decay is a strategy to limit the growth of a weight unless it is really necessary. It can be used to improve the generalisation ability of the network.