Syllable structure based phonetic units for context-dependent continuous Thai speech recognition

Choice of the phonetic units speech recognizer is a factor greatly affecting the system performance. Phonetic units are normally defined according to the acoustic properties of a speech. Nevertheless, with the limit of training data, too delicate acoustic properties are ignored. Syllable structure is one of the properties usually ignored in English phonetic units due to a lot of possible onsets and codas. Some language like Chinese successfully gets the benefit from incorporating the syllable structure information in the phonetic units, as the language itself is naturally syllabic and has only small amount of subsegments (onsets, nuclei, and codas). Thai, as some point between English and Chinese, has larger subsegments than Chinese but not as much as English. The process of this paper can be classified into 2 main steps. First, prove that Thai phonetic units can be defined as a set of syllabic elements without any data sparseness problem. Second, demonstrate that syllable structure based phonetic units give better accuracy rate from integrating the syllable structure information and reduce a lot of number of triphone units because of left and right context constraint in the syllable structure.