Wavelet Sub-Band Based Temporal Features for Robust Hindi Phoneme Recognition

This paper proposes the use of wavelet transform-based feature extraction technique for Hindi speech recognition application. The new proposed features take into account temporal as well as frequency band energy variations for the task of Hindi phoneme recognition. The recognition performance achieved by the proposed features is compared with the standard MFCC and 24-band admissible wavelet packet-based features using a linear discriminant function based classifier. To evaluate robustness of these features, the NOISEX database is used to add different types of noise into phonemes to achieve signal-to-noise ratios in the range of 20 dB to -5 dB. The recognition results show that under noisy background the proposed technique always achieves a better performance over MFCC-based features.

[1]  Hynek Hermansky,et al.  RASTA processing of speech , 1994, IEEE Trans. Speech Audio Process..

[2]  Chih-Yu Hsu,et al.  Discrete Wavelet Transform Applied on Personal Identity Verification with ECG Signal , 2009, Int. J. Wavelets Multiresolution Inf. Process..

[3]  Carlos Dias Maciel,et al.  A neural-wavelet architecture for voice conversion , 2007, Neurocomputing.

[4]  Ashish Verma,et al.  A large-vocabulary continuous speech recognition system for Hindi , 2004, IBM J. Res. Dev..

[5]  Omar Farooq,et al.  Wavelet based robust sub-band features for phoneme recognition , 2004 .

[6]  S. Prabakaran,et al.  A Wavelet Approach for Classification of microarray Data , 2008, Int. J. Wavelets Multiresolution Inf. Process..

[7]  Hidefumi Kobatake,et al.  Spectral transition dynamics of voiceless stop consonants , 1987 .

[8]  Aditya Sharma,et al.  Hybrid wavelet based LPC features for Hindi speech recognition , 2008, Int. J. Inf. Commun. Technol..

[9]  V. Kabeer,et al.  Wavelet-Based Artificial Light receptor Model for Human Face Recognition , 2009, Int. J. Wavelets Multiresolution Inf. Process..

[10]  J.H.L. Hansen,et al.  High resolution speech feature parametrization for monophone-based stressed speech recognition , 2000, IEEE Signal Processing Letters.

[11]  J. Makhoul,et al.  Linear prediction: A tutorial review , 1975, Proceedings of the IEEE.

[12]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[13]  Carlos Dias Maciel,et al.  Wavelet time-frequency analysis and least squares support vector machines for the identification of voice disorders , 2007, Comput. Biol. Medicine.

[14]  Omar Farooq,et al.  Mel filter-like admissible wavelet packet structure for speech recognition , 2001, IEEE Signal Processing Letters.

[15]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[16]  Sungwook Chang,et al.  Speech feature extracted from adaptive wavelet for speech recognition , 1998 .

[17]  Katharine Davis,et al.  Stop Voicing in Hindi , 1994 .