Automatic Acquisition of Hierarchical Transduction Models for Machine Translation

We describe a method for the fully automatic learning of hierarchical finite state translation models. The input to the method is transcribed speech utterances and their corresponding human translations, and the output is a set of head transducers, i.e. statistical lexical head-outward transducers. A word-alignment function and a head-ranking function are first obtained, and then counts are generated for hypothesized state transitions of head transducers whose lexical translations and word order changes are consistent with the alignment. The method has been applied to create an English-Spanish translation model for a Speech translation application, with word accuracy of over 75% as measured by a string-distance comparison to three reference translations.