Leveraging Chinese Encyclopedia for Weakly Supervised Relation Extraction

In the research of named-entity relation extraction based on supervision, selecting relation features for traditional methods are usually finished by people, and it’s hard to implement these methods for large-scale corpus. On the other hand, fixing relation types is the premise, so the practicabilities of these methods are not so ideal. This paper presents a weakly supervised method for Chinese named-entity relation extraction without man-made annotations, and the relation types in this method are not chosen artificially. The method collects entity relation types from the structured knowledge in encyclopedia pages, and then automatically annotates the relation instances existing in the texts based on these relation types. Simultaneously, the syntactic and semantic features of entity relations will be considered in this method, then the machine learning data will be completed, finally we use Support Vector Machine (SVM) model to train relation classifiers from training data, and these classifiers could try to extract entity relations from testing data. We carry out the experiment with the data from Chinese Baidu Encyclopedia pages, and the results show the effectiveness of this method, the overall F1 value reaches to 83.12 %. In order to probe the universality of this method, we also use the acquired relation classifiers to extract entity relations from news texts, and the results manifest that this method owns certain universality.