Continuous speech recognition using joint features derived from the modified group delay function and MFCC

Feature extraction and selection for continuous speech recognition is a complex task. State of the art speech recognition systems use features that are derived by ignoring the Fourier transform phase. In our earlier studies we have shown the efficacy of The Modified Group Delay Feature (MODGDF) derived from the Fourier transform phase for phoneme, syllable and speaker recognition. In this paper we use the MODGDF and the popular MFCC derived from Fourier transform magnitude to compute joint features for continuous speech recognition of two Indian languages Tamil and Telugu. A novel method of segmentation of the continuous speech signal into syllable like units followed by isolated style recognition using HMMs is used. We further use an innovative technique which transforms the problem of detecting the correct string of syllabic units with maximum likelihood to finding an optimal state sequence locally. The recognition system does not use any language models. The MODGDF gave promising recognition performance for the two languages and compared well with the MFCC. Joint features derived using MODGDF and MFCC gave a 10.6% improvement for both Tamil and Telugu languages. The improvement reinforces the hypothesis that MODGDF captures complementary information to that of the MFCC and can be used along with the MFCC to capture the complete information in the speech signal at functional level and help in avoiding heavy auditory and language models.

[1]  Rajesh M. Hegde,et al.  Speaker Identification using the modified group delay feature , 2003 .

[2]  Hema A. Murthy,et al.  Continuous speech recognition using automatically segmented data at syllabic units , 2002, 6th International Conference on Signal Processing, 2002..

[3]  Daniel P. W. Ellis STREAM COMBINATION BEFORE AND/OR AFTER THE ACOUSTIC MODEL , 1999 .

[4]  Hema A. Murthy,et al.  The modified group delay function and its application to phoneme recognition , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[5]  Rajesh M. Hegde,et al.  Application of the modified group delay function to speaker identification and discrimination , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.