论文信息 - The Standard of Chinese Corpus Metadata

The Standard of Chinese Corpus Metadata

The normalization of corpus metadata plays a key role in building sharable corpora. However, there is no uniform specification for defining and processing metadata in Chinese corpus nowadays. This paper introduces a metadata system we’ve proposed for Chinese corpus. 46 elements are defined in all, which can be divided into 6 classes: information about copyright, information about background of linguistic material creator, information about medium of linguistic material, information about the content of linguistic material, information about collecting linguistic material, and information about management of linguistic material. To distinguish one element from another, or our elements from someone else’s, we provide a potent description method, where 10 subsections are designed to describe the detailed properties for each element.

Tingting He | Xiaoqi Xu