Improvement in TF-IDF scheme for Web pages based on the contents of their hyperlinked neighboring pages

Recently, context-dependent hidden Markov models that take a phone's preceding and succeeding phonetic context into account have become widely used as acoustic models in continuous speech recognition systems. However, the use of context-dependent hidden Markov models results in an increase in the total number of models, thereby creating a system that includes an extremely large number of free parameters, and it therefore becomes difficult to reliably estimate such parameters from observed statistics. For this reason, parameter-tying methods whereby parameters are shared between models have been proposed. Of these, tying states on the basis of decision trees has proved to be one particularly good method for resolving this problem. However, because the parameter-tying structures created in such methods typically use all dimensions of the feature vector as the unit for each state in the parameter-tying structure, tying all dimensions simultaneously, we are faced with the problem that it is not possible to construct different structures for the sharing of parameters for each individual dimension, or therefore to assign the appropriate number of parameters to each one. Here, introducing a method for partitioning the feature dimensions on the basis of the minimum description length criterion, we extend phonetic decision trees, proposing a decision tree clustering method that accommodates both phones and dimensions. In addition, adding a partition condition related to state position, we propose a method for simultaneously clustering phonetic context, dimension, and state position using decision trees. We show that in speaker-independent continuous speech recognition the proposed method brings a reduction of 13 to 15 percent in error rate when compared to previous state tying methods based on phonetic decision trees. © 2005 Wiley Periodicals, Inc. Syst Comp Jpn, 36(14): 44–55, 2005; Published online in Wiley InterScience (). DOI 10.1002sscj.20357

[1]  Shigeki Matsuda,et al.  Feature-dependent allophone clustering , 2000, INTERSPEECH.

[2]  Shigeki Sagayama,et al.  Asynchronous-Transition HMM for Acoustic Modeling , 2000 .

[3]  Sadaoki Furui,et al.  Speaker-independent isolated word recognition using dynamic features of speech spectrum , 1986, IEEE Trans. Acoust. Speech Signal Process..

[4]  Mari Ostendorf,et al.  HMM topology design using maximum likelihood successive state splitting , 1997, Comput. Speech Lang..

[5]  S. J. Young,et al.  Tree-based state tying for high accuracy acoustic modelling , 1994 .

[6]  Koichi Shinoda,et al.  MDL-based context-dependent subword modeling for speech recognition , 2000 .

[7]  Wu Chou,et al.  A unified approach of incorporating general features in decision tree based acoustic modeling , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[8]  Roland Kuhn,et al.  Improving decision trees for acoustic modeling , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[9]  Mari Ostendorf,et al.  Use of higher level linguistic structure in acoustic modeling for speech recognition , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[10]  Mei-Yuh Hwang,et al.  Predicting unseen triphones with senones , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11]  Wu Chou,et al.  Decision tree state tying based on penalized Bayesian information criterion , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[12]  Heiga Zen,et al.  Speech recognition using voice-characteristic-dependent acoustic models , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..