Turkish Large Vocabulary Continuous Speech Recognition by using limited audio corpus

In this paper, the recognition performances of several methodologies proposed in the context of Turkish Large Vocabulary Continuous Speech Recognition are retrieved by using a limited audio corpus. Word based, stem based, stem-ending based, and morph based language models are utilized with different n-gram orders. Word based and stem-ending based language models are extended by using several approaches. Also, a hybrid language model which is based on word based and stem-ending based language models is proposed. Word based language model is observed to outperform sub-word language models when limited audio corpus is used.