Speech Recognition Using Acoustic Similarity-Based Primitives

This paper proposes an algorithm that automatically acquires a new recognition primitive by splitting the training sample so that the likeness of the whole training sample to the model is maximized. The primitive obtained by the proposed method is a primitive integrating several time-continuous phonemes. It is called an acoustic similarity-based primitive (ASP). The algorithm proposed in this paper performs in parallel ASP acquisition and ASP modeling by HMnet. The two are simultaneously optimized. In phoneme recognition experiments for six specified speakers, the recognition rate was improved by approximately 3.5% on the average, compared to the conventional phoneme HMnet, by increasing the number of candidates by approximately 14.6% on the average. A method is also proposed in which an ASP is built in a large-vocabulary continuous speech recognition system. In a recognition experiment with eight specified speakers, the accuracy of word recognition was improved by approximately 2.6%, compared to the phoneme HMnet. © 2001 Scripta Technica, Syst Comp Jpn, 33(1): 8–17, 2002