On Language Modelling in Automatic Speech Recognition

Quantitative linguistics is expected to offer linguistic arguments which might help to further develop and improve mathematical procedures in the field of speech recognition modelling, especially those based on Hidden Markov Models, and to apply them to data from various languages. In the present paper, the author deals with advantages and limitations of (a) adding more information, namely part-of-speech tags, into the training dictionary of word forms, and (b) adding more information about the rhythm of the clause, in terms of probability distribution of syllable bigrams. The analyzed language is Czech.