OF THE SPOKEN LANG BASED ON A MULTIMODAL INFA