Automatic Lexical Alignment between Syntactically Weak Related Languages. Application for English and Romanian

In this paper we describe an alignment system that takes English-Romanian parallel sentences (bitexts) and aligns them at their content-word level. A syntactic feature approach combined with a dictionary lookup is used as primary technique to perform word alignments. Other used methods take into account local word grouping or the nearest aligned neighbors approach to filter between many-to-many word alignments. Building an alignment system at the word level, one can use it in the creation of new resources, for example collections of parallel sequences of texts in the two languages based on which translation schemes could be learned.