An enhanced training method for speech recognition in the VODIS project

The authors report on a new training scheme for embedded training in connected speech recognition developed for the LOGOS II speech recognizer in the VODIS project. The technique is based on speech segmentation and k-means clustering, and it has been applied successfully to both linear predictive coding parameter templates and digital filterbank templates. Speech segmentation is achieved by applying Fisher's discriminant algorithm to the time-warped distances between two minimally different connected-word utterances selected by the grammatical rules and the vocabulary of the particular application. The tokens within each word set are clustered using k-means, and the cluster centers in each set are chosen as reference tokens for the particular word. One to four cluster centers have been used in this investigation. The enhanced training scheme has been applied to connected digit and connected word recognition using the 170-word VODIS vocabulary. Recognition performance improvements were between 25% and 34% over training with isolated words.<<ETX>>