Segmentation of speech into syllable-like units

In the development of a syllable-centric ASR system, segmentation of the acoustic signal into syllabic units is an important stage. This paper presents a minimum phase group delay based approach to segment spontaneous speech into syllablelike units. Here, three different minimum phase signals are derived from the short term energy functions of three sub-bands of speech signals, as if it were a magnitude spectrum. The experiments are carried out on Switchboard and OGI-MLTS corpus and the error in segmentation is found to be utmost 40msec for 85% of the syllable segments.

[1]  Seiichi Nakagawa,et al.  A method for continuous speech segmentation using HMM , 1988, [1988 Proceedings] 9th International Conference on Pattern Recognition.

[2]  N. S. Barnett,et al.  Private communication , 1969 .

[3]  Steven Greenberg,et al.  Incorporating information from syllable-length time scales into automatic speech recognition , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[4]  O. Fujimura,et al.  Syllable as a unit of speech recognition , 1975 .

[5]  L. Shastri,et al.  SYLLABLE DETECTION AND SEGMENTATION USING TEMPORAL FLOW NEURAL NETWORKS , 1999 .

[6]  Alan V. Oppenheim,et al.  Discrete-time Signal Processing. Vol.2 , 2001 .

[7]  Bayya Yegnanarayana,et al.  Formant extraction from group delay function , 1991, Speech Commun..