Mel-scaled Discrete Wavelet Transform and dynamic features for the Persian phoneme recognition

In this paper we use a feature vector consisting of the Mel Frequency Discrete Wavelet Coefficients to recognize spoken phonemes in the Persian language. The purpose of using wavelet in feature extraction is to benefit from its multi resolution analysis and localization property in time and frequency domains. The MFDWCs are obtained by applying the Discrete Wavelet Transform (DWT) to the Mel-scaled log filter bank energies of a speech frame. Feature vectors are used for the HMM-based phoneme recognition on a portion of the FarsDat Persian language database consisting of 35 hour recorded data for training and 15 hour for testing. We evaluate the performance of new features for clean speech and noisy speech and compare it with the Mel Frequency Cepstral Coefficients (MFCC). Experiments on a phone recognition task based on the MFDWC give better result than recognizers based on the MFCC features for both white noise and clean speech cases.

[1]  Christophe d'Alessandro Auditory-based wavelet representation , 1993 .

[2]  Seyyed Ali Seyyedsalehi,et al.  Speech recognition using three channel redundant wavelet filterbank , 2010, 2010 The 2nd International Conference on Industrial Mechatronics and Automation.

[3]  Adrião Duarte Dória Neto,et al.  Digit recognition using wavelet and SVM in Brazilian Portuguese , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[4]  Minyue Fu,et al.  The use of wavelet transforms in phoneme recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[5]  Zekeriya Tufekci,et al.  Mel-scaled discrete wavelet coefficients for speech recognition , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[6]  J. N. Gowdy,et al.  Feature extraction using discrete wavelet transform for speech recognition , 2000, Proceedings of the IEEE SoutheastCon 2000. 'Preparing for The New Millennium' (Cat. No.00CH37105).

[7]  Christopher John Long,et al.  Wavelet based feature extraction for phoneme recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[8]  L. Rabiner,et al.  An introduction to hidden Markov models , 1986, IEEE ASSP Magazine.

[9]  C. Burrus,et al.  Introduction to Wavelets and Wavelet Transforms: A Primer , 1997 .

[10]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.