Speech Feature Evaluation for Bangla Automatic Speech Recognition

This chapter presents Bangla (widely known as Bengali) Automatic Speech Recognition (ASR) techniques by evaluating the different speech features, such as Mel Frequency Cepstral Coefficients (MFCCs), Local Features (LFs), phoneme probabilities extracted by time delay artificial neural networks of different architectures. Moreover, canonicalization of speech features is also performed for Gender-Independent (GI) ASR. In the canonicalization process, the authors have designed three classifiers by male, female, and GI speakers, and extracted the output probabilities from these classifiers for measuring the maximum. The maximization of output probabilities for each speech file provides higher correctness and accuracies for GI speech recognition. Besides, dynamic parameters (velocity and acceleration coefficients) are also used in the experiments for obtaining higher accuracy in phoneme recognition. From the experiments, it is also shown that dynamic parameters with hybrid features also increase the phoneme recognition performance in a certain extent. These parameters not only increase the accuracy of the ASR system, but also reduce the computation complexity of Hidden Markov Model (HMM)-based classifiers with fewer mixture components. Speech Feature Evaluation for Bangla Automatic Speech Recognition

[1]  Mumit Khan,et al.  Isolated and continuous bangla speech recognition: implementation, performance and application perspective , 2007 .

[2]  Tsuneo Nitta Feature extraction for speech recognition based on orthogonal acoustic-feature planes and LDA , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[3]  Baozong Yuan,et al.  6th International Conference on Spoken Language Processing : ICSLP 2000, Oct.16-Oct.20, 2000, Beijing International Convention Center, Beijing, China : the proceedings of the conference , 2000 .

[4]  Ghulam Muhammad,et al.  Automatic speech recognition for Bangla digits , 2009, 2009 12th International Conference on Computers and Information Technology.

[5]  Ghulam Muhammad,et al.  Bangla phoneme recognition for ASR using multilayer neural network , 2010, 2010 13th International Conference on Computer and Information Technology (ICCIT).

[6]  Hui Lin,et al.  OOV detection by joint word/phone lattice alignment , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[7]  Foyzul Hassan,et al.  Gender Effects Suppression in Bangla ASR by Designing Multiple HMM-Based Classifiers , 2011, 2011 International Conference on Computational Intelligence and Communication Networks.

[8]  Foyzul Hassan,et al.  Bangla ASR design by suppressing gender factor with gender-independent and gender-based HMM classifiers , 2011, 2011 World Congress on Information and Communication Technologies.

[9]  Foyzul Hassan,et al.  Incorporation of dynamic parameters in hybrid feature-based Bangla phoneme recognition using multilayer Neural Networks , 2011, 14th International Conference on Computer and Information Technology (ICCIT 2011).