Sentence alignment in bilingual corpora based on crosslingual querying

The effectiveness of translation memory for computer-aided translation depends on the results of previous sentence alignment. This paper describes a new approach to sentence alignment, based on a crosslingual querying using the technology of an existing product, SPIRIT (Syntactic and Probabilistic Indexing and Retrieval of Information in Texts). Sentence alignment and crosslingual querying based on bilingual reformulation are similar problems: both are based on a semantic proximity between two texts in different languages; both aim to find the sentences that contain most of the information demanded by the query. However, sentence alignment requires the irrelevant part of a sentence to be as short as possible. Crosslingual querying provides sentence alignment with candidates. ARCADE evaluation has shown that this approach is very robust in the cases of inverted sentence order and missing segments.