Phoneme-based word recognition by neural network-a step toward large vocabulary recognition

A neural-network-based word recognition system extendible to large-vocabulary isolated word recognition is presented. The system consists of (1) time-delay neural networks (TDNNs) for phoneme spotting and (2) a higher-level network and a dynamic programming (DP) time alignment procedure for word recognition. TDNN-based phenome-spotting networks are used whose role is to fire when a particular phenome is input. A higher-level network then improves these phenome firing patterns in view of an idealized phoneme sequence. For training of the higher-level network, DP matching is used to determine idealized phoneme firing patterns which are nearest to the actual phoneme firings. During recognition, the system selects the most probable word by applying DP matching to the outputs of the higher-level network. Speaker-dependent and isolated word recognition experiments show that word recognition rates of around 92% can be achieved for medium-size vocabularies