Cued Speech (CS) is a manual code that complements lipreading to enhance speech perception from visual input. The phonetic translation of CS gestures needs to combine the manual CS information with information from the lips, taking into account the desynchronization delay (Attina et al. [1], Aboutabit et al. [2]) between these two flows of information. This paper focuses on HMM recognition of the lip flow for Consonant Vowel (CV) syllables in the French Cued Speech production context. The CV syllables are considered in term of viseme groups that are compatible with the CS system. The HMM modeling is based on parameters derived from both the inner and outer lip contours. The global recognition score of CV syllable reaches 80.3%. This study shows that the errors are mainly observed on consonant groups in the context of high and mid-high rounded vowels. In contrast, CV syllables for anterior non rounded vowels and for low and mid-low rounded vowels are well recognized (in average 87%).
[1]
Dominique Vaufreydaz,et al.
A New Methodology for Speech Corpora Definition from Internet Documents
,
2000,
LREC.
[2]
Christian Abry,et al.
"Laws" for lips
,
1986,
Speech Commun..
[3]
Mohamed Tahar Lallouache,et al.
Un poste "visage-parole" couleur : acquisition et traitement automatique des contours des lèvres
,
1991
.
[4]
M. C. Jones.
Cued speech.
,
1992,
ASHA.
[5]
Denis Beautemps,et al.
A pilot study of temporal organization in Cued Speech production of French syllables: rules for a Cued Speech synthesizer
,
2004,
Speech Commun..
[6]
Denis Beautemps,et al.
Hand and Lip Desynchronization Analysis in French Cued Speech: Automatic Temporal Segmentation of Hand Flow
,
2006,
2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.