Toward a speaker-independent visual articulatory feedback system

Context Several studies tend to show that visual articulatory feedback is useful for phonetic correction, both for speech therapy and “Computer Aided Pronunciation Training” (CAPT) [1]. In [2], we proposed a visual articulatory feedback system based on a 3D talking head used in “an augmented speech scenario”, i.e. displaying all speech articulators including the tongue and velum. In the proposed system, the clone is automatically animated from the audio speech signal, using acoustic-to-articulatory inversion techniques based on statistical approaches. However, this system was speaker-dependant and thus could not be used in realistic situations. We report here on latest developments of this system and present a first approach that allows its potential use by any speaker.