A HMM recognition of consonant-vowel syllables from lip contours: the cued speech case

Cued Speech (CS) is a manual code that complements lipreading to enhance speech perception from visual input. The phonetic translation of CS gestures needs to combine the manual CS information with information from the lips, taking into account the desynchronization delay (Attina et al. [1], Aboutabit et al. [2]) between these two flows of information. This paper focuses on HMM recognition of the lip flow for Consonant Vowel (CV) syllables in the French Cued Speech production context. The CV syllables are considered in term of viseme groups that are compatible with the CS system. The HMM modeling is based on parameters derived from both the inner and outer lip contours. The global recognition score of CV syllable reaches 80.3%. This study shows that the errors are mainly observed on consonant groups in the context of high and mid-high rounded vowels. In contrast, CV syllables for anterior non rounded vowels and for low and mid-low rounded vowels are well recognized (in average 87%).