Audio-visual classification of Swedish phonemes for pronun ciation training

We present a method for audio-visual classification of Swedi sh phonemes, to be used in computer-assisted pronunciation training. The probabilistic kernel-based method is applied to the audio signal and/or either a principal or an independent component (PCA or ICA) representation of the mouth region in video images. We investigate which representation (PCA or ICA) that may be most suitable and the number of components required in the base, in order to be able to automatically detect pronunciation errors in Swedish from audio-visual input. Experiments performed on one speaker show that the visual information help avoiding classification errors that would lead to gravely er roneous feedback to the user; that it is better to perform phoneme classification on audio and video seperately and then fuse th e results, rather than combining them before classification; and that PCA outperforms ICA for few components. Index Terms: audiovisual phoneme classification, pronunciation error detection, PCA, ICA