Learning-to-Translate Based on the S-SSTC Annotation Schema

We present the S-SSTC framework for machine translation (MT), introduced in 2002 and developed since as a set of working MT systems (SiSTeC-ebmt). Our approach is example-based, but differs from other EBMT approaches in that it uses alignments of string-tree alignments, and in that supervised learning is an integral part of the approach. Our model directly deals with three main difficulties in the traditional treatment of MT that stem from its separation from the "translation task" (the 'world'). First, by allowing the system to learn from real translation examples directly, we avoid the need to indefinitely pursue the elusive goal of writing grammars to exactly describe intermediate syntacticosemantic monolingual representations and their correspondences. Second, we make explicit the dependence of the MT system performance on the input from the environment. That is possible only because the learning process uses feedback from the real translation knowledge when constructing its knowledge representation. Third, such MT systems using an inductively learned knowledge base yield a desirable non-regressive behavior by using translation mistakes to improve their knowledge base.