Sinhala-Tamil Machine Translation: Towards better Translation Quality

Statistical Machine Translation (SMT) is a well-known and well established datadriven approach used for language translation. The focus of this work is to develop a statistical machine translation system for Sri Lankan languages, Sinhala and Tamil language pair. This paper presents a systematic investigation of how SinhalaTamil SMT performance varies with the amount of parallel training data used, in order to find out the minimum needed to develop a machine translation system with acceptable performance.