Towards multilingual speech recognition using data driven source/target acoustical units association

Multilingual speech recognition pushes to us study the acoustic modeling of target language units using one or more source languages' units. This paper presents a study of manual and data driven association of two possible target units with source language's phonemes. The target units studied are words and phonemes. Algorithms for data-driven association are described. While phoneme-to-phoneme association is more practical, words' transcription provides better results. It has been shown that more precise and rich source models are more suitable to determine those association. Experiments are conducted with French as source language and Arabic as target language.