An automatic algorithm for segmenting and labelling a connected digit sequence

Group delay functions provide an alternative representation of signal information. The main features of group delay functions are the additive and high resolution properties. The Fourier transform (FT) phase is generally featureless due to random polority and wrapping. But the group delay function which is de ned as the negative derivative of phase, can be processed to derive signi cant information such as peaks and valleys in the spectral envelope. In this paper, we show an application of group delay function to solve the segmentation problem in speech. In the proposed method a new signal is generated by symmetrising the short term energy function. The minimum phase group delay function of this signal is computed, the valleys of which correspond to segment boundaries. The proposed technique was tested on manually segmented digit utterances of the TI-DIGITS database. The overall correct segmentation performance is 77.8%. Digitwise recognition performance on the correctly segmented database is 87.1%

[1]  P. Mermelstein Automatic segmentation of speech into syllabic units. , 1975, The Journal of the Acoustical Society of America.

[2]  Hema A. Murthy ALGORITHMS FOR PROCESSING FOURIER TRANSFORM PHASE OF SIGNALS , 1991 .

[3]  Steven Greenberg,et al.  Integrating syllable boundary information into speech recognition , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  B. Yegnanarayana Formant extraction from linear‐prediction phase spectra , 1978 .

[5]  Bayya Yegnanarayana,et al.  Formant extraction from group delay function , 1991, Speech Commun..