The use of nonlinear energy transformation for Tamil connected-digit speech recognition

Generally, the input feature to the recognizer used for recognition and modeling has been extended to include dynamic information about the rst and second order derivatives of the cepstral features, energy as well as the information about the cepstrum and the peak normalized energy. The problem with energy normalization approach is that it is not suitable for real-time application since it introduces long delays in determining the peak energy. In this paper, we propose a more eÆcient implementation approach for energy feature transformation where the energy feature is mapped into a scale of 0 to 1 using a sigmoid function and hence avoiding the need for energy normalization. The experimental results on Tamil connected digit recognition task show that a 20% string error rate reduction is obtained by using the proposed nonlinear energy transformation scheme when compared to using untransformed raw energy feature.

[1]  Y.K. Muthusamy,et al.  Reviewing automatic language identification , 1994, IEEE Signal Processing Magazine.

[2]  John H. L. Hansen,et al.  Frequency characteristics of foreign accented speech , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Abeer Alwan,et al.  Liquids in Tamil , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[4]  Rathinavelu Chengalvarayan,et al.  HMM-based speech recognition using state-dependent, linear transforms on Mel-warped DFT features , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[5]  Barbara F. Grimes Ethnologue Languages of the World , 1988 .

[6]  Lin-Shan Lee,et al.  Voice dictation of Mandarin Chinese , 1997, IEEE Signal Process. Mag..

[7]  Rathinavelu Chengalvarayan,et al.  Robust energy normalization using speech/nonspeech discriminator for German connected digit recognition , 1999, EUROSPEECH.

[8]  David L. Thomson,et al.  Use of periodicity and jitter as speech recognition features , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).