The Challenge of Parallel Text Processing

The paper presents the technology of building a large German-French parallel corpus consisting of official documents of the European Union and Switzerland, and private and public organisations in France and Germany. The texts are morphosyntactically annotated, aligned at the sentence level and marked up in conformance with the TEI guidelines for standardised representation. The multilevel alignment method is applied; its precision is improved due to the correlation with the constraints of the classical alignment method of Gale and Church. The alignment information is encoded externally to the parallel text documents. The process of creating the corpus is an interesting algorithm of applying a number of software tools and adjusting intermediate production results.