Segment boundary estimation using recurrent neural networks

This paper describes a segment (e.g. phoneme) boundary estimation method based on recurrent neural networks (RNNs). The proposed method only requires acoustic observations to accurately estimate segment boundaries. Experimental results show that the proposed method can estimate segment boundaries signi cantly better than an HMM based method. Furthermore, we incorporate the RNN based segment boundary estimator into the HMM based and segment based recognition systems. As a result, the segment boundary estimates give useful information for reducing computational complexity and improving recognition performance.

[1]  Shigeki Sagayama,et al.  A successive state splitting algorithm for efficient allophone modeling , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[3]  Yoshinori Sagisaka,et al.  Spontaneous dialogue speech recognition using cross-word context constrained word graphs , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[4]  M. Schuster Learning out of time series with an extended recurrent neural network , 1996, Neural Networks for Signal Processing VI. Proceedings of the 1996 IEEE Signal Processing Society Workshop.

[5]  Michael Riley,et al.  Automatic segmentation and labeling of speech , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[6]  James R. Glass,et al.  A probabilistic framework for feature-based speech recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[7]  Kuldip K. Paliwal,et al.  Model parameter estimation for mixture density polynomial segment models , 1998, Comput. Speech Lang..

[8]  Steven Greenberg,et al.  Integrating syllable boundary information into speech recognition , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Torbjørn Svendsen,et al.  On the automatic segmentation of speech signals , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10]  Yifan Gong,et al.  The importance of segmentation probability in segment based speech recognizers , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11]  Herbert Gish,et al.  A segmental speech model with applications to word spotting , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.