Bilingual Phrase Extraction from N-Best Alignments

Improved approach of phrase extraction was proposed for phrase-based statistical machine translation. The effectiveness was investigated when using n-best alignments instead of one-best for phrase extraction. Bilingual phrase pairs were extracted in the presented approach by combining word-to-word links from n-best alignments between source and target sentences. First, the n-best alignments were divided into hierarchies by frequencies of word co-occurrence. Second, candidates of phrase pairs were extracted from each layer. Experimental results show that the presented approach outperforms the baseline system Pharaoh in both NIST and BLEU scores. Therefore it is effective to use n-best alignments as an extension to one-best alignment for phrase extraction