Identification and Analysis of Post-Editing Patterns for MT

For this work we have carried out a number of analysis experiments comparing raw MT output produced by Microsoft’s Treelet MT engine (Quirk et al., 2005) with its human post-edited counterpart, for English–German and English–French. Through these experiments we identify a number of interesting post-editing patterns, both textual (string-based) and constituent-based. In this paper we discuss our analysis methodologies, present some of our results and provide information on how this type of analysis can be of benefit to translation systems and posteditors, with a view to improving initial MT output and consequently post-editor productivity. In addition, we also discuss the MT and post-editing workflow at Microsoft and results from MT post-editing pilots for a number of different language pairs.