论文信息 - Automatic identification of vowels in the Cued Speech context

Automatic identification of vowels in the Cued Speech context

The phonetic translation of Cued Speech (CS) (Cornett [1]) gestures needs to mix the manual CS information together with the lips, taking into account the desynchronization delay (Attina et al. [2], Aboutabit et al. [3]) between these two flows of information. The automatic coding of CS hand positions and lip targets (Aboutabit et al. [3], Aboutabit et al. [4]) are thus a key factor in the mixing process. This contribution focuses on the identification of vowels by merging CS hand positions and vocalic lip information produced by a CS speaker. The hand flow is coded automatically as plateaus between transition phases. A plateau is defined as the interval during which the hand is maintained at a specific CS hand position. A transition is the interval during which the hand moves from a specific CS hand position to another one. The CS hand position is automatically obtained as the result of the hand 2d coordinates Gaussian classification. The instants of reached hand targets are used as reference instants to define the interval inside which the lip target instant of the vowel is automatically detected. The lip parameters extracted at this instant are processed in a Gaussian classifier as to identify the vocalic lip feature of the vowel. The vowel is obtained as the result of the combination of the corresponding hand position and the lip feature. The global performance of the method attains 77.6% as correct identification score. This result does not take into account the CS coding errors. This result has to be compared with the global 83.5% score of speech reception by deaf people using CS (Nichols and Ling, 1982 [6].

Denis Beautemps | Noureddine Aboutabit | Laurent Besacier

[1] R. Campbell,et al. Hearing by eye 2 : advances in the psychology of speechreading and auditory-visual speech , 1997 .

[2] G. H. Nicholls,et al. Cued Speech and the reception of spoken language. , 1982, Journal of speech and hearing research.

[3] Mohamed Tahar Lallouache,et al. Un poste "visage-parole" couleur : acquisition et traitement automatique des contours des lèvres , 1991 .

[4] Dominique Vaufreydaz,et al. A New Methodology for Speech Corpora Definition from Internet Documents , 2000, LREC.

[5] Denis Beautemps,et al. A pilot study of temporal organization in Cued Speech production of French syllables: rules for a Cued Speech synthesizer , 2004, Speech Commun..

[6] M. C. Jones. Cued speech. , 1992, ASHA.

[7] Denis Beautemps,et al. Hand and Lip Desynchronization Analysis in French Cued Speech: Automatic Temporal Segmentation of Hand Flow , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.