A Quinphone-Based Context-Dependent Acoustic Modeling for LVCSR

Automatic speech recognition (ASR) is used for accurate and efficient conversion of speech signal into a text message. Generally, speech signal is taken as input and it is processed at front end to extract features and then computed at back end using the GMM model. GMM mixture selection is quite important depending upon the size of dataset. As for concise vocabulary, use of triphone-based acoustic modeling exhibits good results but for large size vocabulary, quinphone (quadraphones)-based acoustic modeling gives better performance. This paper compares the performance of context-independent- and context-dependent-based acoustic modeling to reduce error rate.

[1]  Ivan Grech,et al.  Comparative study of automatic speech recognition techniques , 2013, IET Signal Process..

[2]  Mayank Dave,et al.  Using Gaussian Mixtures for Hindi Speech Recognition System , 2011 .

[3]  T. Hori,et al.  Construction of weighted finite state transducers for very wide context-dependent acoustic models , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..

[4]  Mohit Dua,et al.  Continuous Hindi speech recognition using Gaussian mixture HMM , 2014, 2014 IEEE Students' Conference on Electrical, Electronics and Computer Science.

[5]  Jun Cai,et al.  Efficient likelihood evaluation and dynamic Gaussian selection for HMM-based speech recognition , 2009, Comput. Speech Lang..

[6]  David Rybach,et al.  Direct construction of compact context-dependency transducers from data , 2014, Comput. Speech Lang..

[7]  D. O'Shaughnessy Acoustic Analysis for Automatic Speech Recognition , 2013, Proceedings of the IEEE.