Simultaneous clustering of phonetic context, dimension, and state position for acoustic modeling using decision trees

Recently, context-dependent hidden Markov models that take a phone's preceding and succeeding phonetic context into account have become widely used as acoustic models in continuous speech recognition systems. However, the use of context-dependent hidden Markov models results in an increase in the total number of models, thereby creating a system that includes an extremely large number of free parameters, and it therefore becomes difficult to reliably estimate such parameters from observed statistics. For this reason, parameter-tying methods whereby parameters are shared between models have been proposed. Of these, tying states on the basis of decision trees has proved to be one particularly good method for resolving this problem. However, because the parameter-tying structures created in such methods typically use all dimensions of the feature vector as the unit for each state in the parameter-tying structure, tying all dimensions simultaneously, we are faced with the problem that it is not possible to construct different structures for the sharing of parameters for each individual dimension, or therefore to assign the appropriate number of parameters to each one. Here, introducing a method for partitioning the feature dimensions on the basis of the minimum description length criterion, we extend phonetic decision trees, proposing a decision tree clustering method that accommodates both phones and dimensions. In addition, adding a partition condition related to state position, we propose a method for simultaneously clustering phonetic context, dimension, and state position using decision trees. We show that in speaker-independent continuous speech recognition the proposed method brings a reduction of 13 to 15 percent in error rate when compared to previous state tying methods based on phonetic decision trees. © 2005 Wiley Periodicals, Inc. Syst Comp Jpn, 36(14): 44–55, 2005; Published online in Wiley InterScience (). DOI 10.1002sscj.20357