Chinese News Event Corpus Construction Method Based on Syntax Tree

At present, the weakly supervised model is usually used for the expansion of the event corpus, which avoids the expensive manual annotation process. However, the weakly supervised model relies on the knowledge base and a small part of manually annotated corpus data, which makes the model have the problems of poor portability. In order to solve this problem, we construct a public domain event extraction model using syntax tree. In this paper, we propose a classification structure of Chinese syntax tree according to the view of event extraction, and put forward an event extraction algorithm for various syntax tree types. Moreover, in the construction algorithm of trigger word dictionary, we use cross-corpus dictionary information to construct Chinese trigger word dictionary from the perspective of semantics. As a result, we obtain 40,128 Chinese news events, which initially constituted the corpus of Chinese new events.

[1]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[2]  권홍우,et al.  Bootstrapping , 2002, ACL.

[3]  Teruko Mitamura,et al.  Open-Domain Event Detection using Distant Supervision , 2018, COLING.

[4]  E. Azar The Conflict and Peace Data Bank (COPDAB) Project , 1980 .

[5]  John Beieler,et al.  PETRARCH2: Another Event Coding Program , 2017, J. Open Source Softw..

[6]  Latifur Khan,et al.  Translating CAMEO verbs for automated coding of event data , 2019, International Interactions.

[7]  Ralph Grishman,et al.  Can Document Selection Help Semi-supervised Learning? A Case Study On Event Extraction , 2011, ACL.

[8]  Xiang Zhang,et al.  Automatically Labeled Data Generation for Large Scale Event Extraction , 2017, ACL.

[9]  Dongyan Zhao,et al.  Scale Up Event Extraction Learning via Automatic Training Data Generation , 2017, AAAI.

[10]  Xu Han,et al.  Adversarial Training for Weakly Supervised Event Detection , 2019, NAACL.

[11]  Jun Zhao,et al.  Leveraging FrameNet to Improve Automatic Event Detection , 2016, ACL.

[12]  Philip A. Schrodt,et al.  Conflict and Mediation Event Observations (CAMEO): A New Event Data Framework for the Analysis of Foreign Policy Interactions , 2002 .

[13]  James Ferguson,et al.  Semi-Supervised Event Extraction with Paraphrase Clusters , 2018, NAACL.

[14]  Lei He,et al.  Joint Event Extraction Based on Hierarchical Event Schemas From FrameNet , 2019, IEEE Access.

[15]  Heng Ji,et al.  Language Specific Issue and Feature Exploration in Chinese Event Extraction , 2009, NAACL.

[16]  Maosong Sun,et al.  Scalable Term Selection for Text Categorization , 2007, EMNLP.

[17]  Daniel Marcu,et al.  Biomedical Event Extraction using Abstract Meaning Representation , 2017, BioNLP.

[18]  Yang Xiao,et al.  DCFEE: A Document-level Chinese Financial Event Extraction System based on Automatically Labeled Training Data , 2018, ACL.