The Standard of Chinese Corpus Metadata
暂无分享,去创建一个
The normalization of corpus metadata plays a key role in building sharable corpora. However, there is no uniform specification for defining and processing metadata in Chinese corpus nowadays. This paper introduces a metadata system we’ve proposed for Chinese corpus. 46 elements are defined in all, which can be divided into 6 classes: information about copyright, information about background of linguistic material creator, information about medium of linguistic material, information about the content of linguistic material, information about collecting linguistic material, and information about management of linguistic material. To distinguish one element from another, or our elements from someone else’s, we provide a potent description method, where 10 subsections are designed to describe the detailed properties for each element.