Parallel text alignment is a key procedure in the automated translation area. A large number of aligners have been presented along the years, but these require that the target resources have been pre-prepared for alignment (either manually or automatically). It is rather normal to encounter mixed language documents, that is, documents where the same information is written in many languages (Ex: manuals of electronic devices, touristic information, PhD thesis with dual language abstracts, etc). In this article we present MLT-prealigner: a tool aimed at helping those that need to process mixed texts in order to feed alignment tools and other related language systems. 1998 ACM Subject Classification I.7.2 Document Preparation
[1]
José João Almeida,et al.
Structural alignment of plain text books
,
2012,
LREC.
[2]
José João Almeida,et al.
The Per-Fide Corpus : A new Resource for Corpus-Based Terminology, Contrastive Linguistics and Translation Studies
,
2014
.
[3]
Ben King,et al.
Labeling the Languages of Words in Mixed-Language Documents using Weakly Supervised Methods
,
2013,
NAACL.
[4]
András Kornai,et al.
Parallel corpora for medium density languages
,
2007
.
[5]
Hermann Ney,et al.
Improved Statistical Alignment Models
,
2000,
ACL.