Corpus for the Machine Translation: Types, Sizes and Connected Problems, in Relation to Use and System Type
暂无分享,去创建一个
Corpora used in MT (Machine Translation) of text and speech have evolved, from the early test suites and test corpora, to parallel bilingual and multilingual corpora, raw or enriched by metadata and a large variety of linguistic annotations. They are relatively small and can have a bug "granularity" in "expert" or classical MT, while they are very large and of small granularity in "empirical" MT, be it statistical or example-based. The representation of the texts and of the interface with speech processing poses specific problems, as well as the segmentation and the structuration of segments and corpora. A current challenge is to unify and "wikify" their construction and management.