Building a Spanish-Portuguese parallel corpus for statistical machine translation

Parallel corpora have long been recognised as valuable resources for building MT applications, but their usefulness have often been limited to the translation between language pairs that include English. In this work we describe our efforts to build a parallel corpus for the Brazilian Portuguese and European Spanish languages. The corpus has been aligned at sentence and word levels and manually inspected for correctness, representing a first step towards the development of translation models for this language pair.