An efficient incremental likelihood evaluation for polynomial trajectory model using with application to model training and recognition

The polynomial segment model (PSM), which was first proposed in Gish et al. (1993) and subsequently studied by other researchers, has opened up an alternative research direction for speech recognition. In PSM, speech frames within a segment are jointly modeled such that any change in the boundaries of a segment would require the re-computation of the likelihood of all the frames within the segment. While estimation of the best segment boundaries are possible, the computation consideration typically constrains the PSM model to limit the search to center around some pre-segmentation typically obtained by using another model such as an HMM, in effect limiting the possibility of using PSM itself. In this paper we introduce a new approach to evaluate the likelihood of a PSM segment by efficiently "accumulating" segment likelihood incrementally, i.e. one frame at a time. Based on this incremental likelihood evaluation, an efficient PSM search and training algorithm are also introduced. We show the effectiveness of the incremental likelihood evaluation by building a PSM-based TIMIT recognition system (both training and test) without the need of using another model for pre-segmentation.

[1]  Victor Zue,et al.  Speech database development at MIT: Timit and beyond , 1990, Speech Commun..

[2]  Martin J. Russell,et al.  Probabilistic-trajectory segmental HMMs , 1999, Comput. Speech Lang..

[3]  Herbert Gish,et al.  A segmental speech model with applications to word spotting , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Mari Ostendorf,et al.  From HMM's to segment models: a unified view of stochastic modeling for speech recognition , 1996, IEEE Trans. Speech Audio Process..

[5]  Herbert Gish,et al.  Parametric trajectory mixtures for LVCSR , 1998, ICSLP.

[6]  Kai-Fu Lee,et al.  Automatic Speech Recognition , 1989 .

[7]  Xiaodong Sun,et al.  Speech recognition using hidden Markov models with polynomial regression functions as nonstationary states , 1994, IEEE Trans. Speech Audio Process..

[8]  Yung-Hwan Oh,et al.  A segmental-feature HMM using parametric trajectory model , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).