An overview: Context-dependent acoustic modeling for LVCSR

Automatic speech recognition (ASR) is used for accurate and efficient conversion of speech signal into a text message. Generally, speech signal is taken as input and it is processed at front-end to extract features and then computed at back-end using the GMM model. GMM mixture selection is quite important depending upon the size of dataset. As for concise vocabulary, use of triphone based acoustic modeling exhibit good results but for large size vocabulary, quinphones (quadraphones) based acoustic modeling is expected to give better performance. This paper presents an overview to use quinphones based acoustic modeling to reduce error rate.