[PatentMT] Summary Report of Team III_CYUT_NTHU

In this report paper, we investigate two issues facing phrase-based machine translation (MT) systems such as Moses (Koehn et al., 2007): out-of-vocabulary (OOV) words and singletons. MT systems typically ignore and directly output unknown or OOV source words into the target translation. On the other hand, for words which do not couple with their preceding or following words as phrases, as referred to as singletons, MT systems typically leave their translation disambiguation to language model within which knowledge is somewhat limited and determined by the preset length of words. In this paper, we first analyze the proportion of OOV words and singletons in translation task, summarize types of OOV words, and manually evaluate the impact of singletons on phrase-based MT systems. We also introduce methods for dealing with these two issues without changing the underlying phrase-based decoder.