Speech recognition using time‐delay neural networks

A time‐delay neural network (TDNN) approach is presented to speech recognition that is characterized by two important properties: (1) Using multilayer arrangements of simple computing units, a TDNN can represent arbitrary nonlinear classification decision surfaces that are learned automatically using error back propagation. (2) The time‐delay arrangement enables the network to discover acoustic‐phonetic features and the temporal relationships between them independent of position in time and, hence, not blurred by temporal shifts in the input. The TDNNs are compared with the currently most popular technique in speech recognition, hidden Markov models (HMM). Extensive performance evaluation shows that the TDNN recognizes voiced stops extracted from varying phonetic contexts at an error rate four times lower (1.5% vs 6.3%) than the best of our HMMs. To perform this task, the TDNN “invented” well‐known acoustic‐phonetic features (e.g., F2 rise, F2 fall, vowel onset) as useful abstractions. It also developed a...