论文信息 - Speech Recognition Using Sparse Discrete Wavelet Decomposition Feature Extraction

Speech Recognition Using Sparse Discrete Wavelet Decomposition Feature Extraction

In this paper, a new feature extraction algorithm for speech recognition using sparse discrete wavelet decomposition (SDWD) is proposed. The recognition system contains the following stages: speech data acquisition and preprocessing, speech signal decomposition using the SDWD, feature extraction, and artificial neural network (ANN) classifier. The task of the developed SDWD is to decompose speech signal into band signals based on on the Mel filter bank frequency specifications. Similar to the Mel frequency cepstral coefficient (MFCC) method, the logarithmic values of the filter bank energies are computed and then a discrete cosine transform (DCT) is applied to these logarithmic values to extract the feature. Our experimental results using the ANN classifier demonstrate that our proposed SDWD feature extraction algorithm outperforms over the MFCC and discrete wavelet packet transform (DWPT) algorithms.

[1] Biing-Hwang Juang,et al. Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[2] Dong Yu,et al. Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[3] Wu Chou,et al. Discriminative learning in sequential pattern recognition , 2008, IEEE Signal Processing Magazine.

[4] Li Tan,et al. Comparative study of simple feature extraction for single-channel EEG based classification , 2017, 2017 IEEE International Conference on Electro Information Technology (EIT).

[5] Hsiao-Wuen Hon,et al. Speaker-independent phone recognition using hidden Markov models , 1989, IEEE Trans. Acoust. Speech Signal Process..

[6] Xiao Li,et al. Machine Learning Paradigms for Speech Recognition: An Overview , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[7] R. Haddad,et al. Multiresolution Signal Decomposition: Transforms, Subbands, and Wavelets , 1992 .

[8] Li-Rong Dai,et al. Incoherent training of deep neural networks to de-correlate bottleneck features for speech recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[9] Alex Acero,et al. Spoken Language Processing: A Guide to Theory, Algorithm and System Development , 2001 .

[10] Stan Davis,et al. Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .