BUILDING LANGUAGE RESOURCES AND TRANSLATION MODELS FOR MACHINE TRANSLATION FOCUSED ON SOUTH SLAVIC AND BALKAN LANGUAGES

The paper presents the results of a small and short-term SEE-ERA.net project the purpose of which was to investigate the feasibility of machine translation (MT) research and development for several South Slavic and Balkan languages. For these languages MT systems are scarce and for some of them even non-existent. We argue that by investing efforts in building appropriate language resources, the current technology can be successfully used for a quick development of acceptable MT prototypes, easy to further extend to working systems. The paper describes the parallel corpus compiled in the scope of the project, concentrating on its composition, format, and linguistic analysis. Word-alignments automatically derived from the annotated parallel corpus are also discussed. The paper concludes with direction for further work.