A Data Mining Approach to Learn Reorder Rules for SMT

In this paper, we describe a syntax based source side reordering method for phrase-based statistical machine translation (SMT) systems. The source side training corpus is first parsed, then reordering rules are automatically learnt from source-side phrases and word alignments. Later the source side training and test corpus are reordered and given to the SMT system. Reordering is a common problem observed in language pairs of distant language origins. This paper describes an automated approach for learning reorder rules from a word-aligned parallel corpus using association rule mining. Reordered and generalized rules are the most significant in our approach. Our experiments were conducted on an English-Hindi EILMT corpus.