论文信息 - MULTILINGUAL CORPORA FOR COOPERATION

MULTILINGUAL CORPORA FOR COOPERATION

MLCC was a corpus, acquisition project funded by the EC Telematics program.The aim was to collect a set of texts representing a substantial improvement in range, quantity and quality of corpus material available. Two sub-corpora have been defined to help meet the needs for multilingual data consisting of a comparable set of texts in six languages and a parallel set of data in 9 languages. The comparable text collection includes financial newspaper articles from the early '90s. The parallel data is taken from the Official Journal of the European Commission, sub-series Written Questions to Parliament and from the Proceedings of the European Parliament. The data has been converted to an SGML, TEI-conformant mark-up and is distributed by ELRA.

[1] Mark Liberman,et al. Text on Tap: the ACL/DCI , 1989, HLT.

[2] C. Guittet. Formex: Formalized Exchange of Electronic Publications , 1985 .

[3] Nancy Ide,et al. MULTEXT: Multilingual Text Tools and Corpora , 1994, COLING.

[4] David McKelvie,et al. Data in Your Language: the Eci Multilingual Corpus 1 , 2007 .

[5] Susan Armstrong-Warwick. Acquisition and Exploitation of Textual Resources for NLP , 1994 .

[6] D. McKelvie,et al. Tei-conformant Structural Markup of a Trilingual Parallel Corpus in the Eci Multilingual Corpus 1 1. Overview of the Eci Corpus 1.1. Brief History and Acknowledgements , 1994 .