CaptionEye/EK: a English-to-Korean caption translation system using the sentence pattern

This paper describes CaptionEye/EK, an English-to-Korean caption translation system, which is aiming at translating English broadcasting caption into Korean one. CaptionEye/EK has been designed based on data-driven methodology. This methodology has the characteristics of both shallow bottom-up parsing between protectors and top-down matching by structure-oriented sentence patterns. The shallow bottom-up parsing between protectors is similar to the parsing of noun phrases in rule-based machine translation, and the protectors mean the linguistic part-of-speeches that cause many structural ambiguities in structural analysis. The top-down matching is similar to matching in example-based machine translation, but unlike the bilingual example in EBMT the sentence pattern is the structure-oriented pattern. The sentence patterns are patterns to be built by regarding sentence as translation unit. They consist of the source sentence pattern and the target sentence pattern that corresponds to a source sentence pattern. In order to verify our translation methodology, we made an experiment on 100 sentences that was randomly extracted from CNN news scripts. Each sentence contained average 17.2 words. In the experiment, CaptionEye/EK showed the 61% translation rates with about 28,000 sentence patterns. From the graph on the progress of translation rate, we expect that the more the number of sentence patterns is, the higher translation rate is.