Syntactic and Semantic-driven Learning for Open Information Extraction

One of the biggest bottlenecks in building accurate, high coverage neural open IE systems is the need for large labelled corpora. The diversity of open domain corpora and the variety of natural language expressions further exacerbate this problem. In this paper, we propose a syntactic and semantic-driven learning approach, which can learn neural open IE models without any human-labelled data by leveraging syntactic and semantic knowledge as noisier, higher-level supervision. Specifically, we first employ syntactic patterns as data labelling functions and pretrain a base model using the generated labels. Then we propose a syntactic and semantic-driven reinforcement learning algorithm, which can effectively generalize the base model to open situations with high accuracy. Experimental results show that our approach significantly outperforms the supervised counterparts, and can even achieve competitive performance to supervised state-of-the-art (SoA) model.

[1]  Harinder Pal,et al.  Demonyms and Compound Relational Nouns in Nominal Open IE , 2016, AKBC@NAACL-HLT.

[2]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[3]  Satoshi Sekine,et al.  On-Demand Information Extraction , 2006, ACL.

[4]  Ido Dagan,et al.  Creating a Large Benchmark for Open Information Extraction , 2016, EMNLP.

[5]  Oren Etzioni,et al.  Open Language Learning for Information Extraction , 2012, EMNLP.

[6]  Mausam,et al.  Open Information Extraction Systems and Downstream Applications , 2016, IJCAI.

[7]  Regina Barzilay,et al.  Improving Information Extraction by Acquiring External Evidence with Reinforcement Learning , 2016, EMNLP.

[8]  Shimei Pan,et al.  Supervising Unsupervised Open Information Extraction Models , 2019, EMNLP.

[9]  Denilson Barbosa,et al.  Effectiveness and Efficiency of Open Relation Extraction , 2013, EMNLP.

[10]  Hinrich Schütze,et al.  Neural Architectures for Open-Type Relation Argument Extraction , 2019, Nat. Lang. Eng..

[11]  Ido Dagan,et al.  Getting More Out Of Syntax with PropS , 2016, ArXiv.

[12]  Oren Etzioni,et al.  Identifying Relations for Open Information Extraction , 2011, EMNLP.

[13]  Luciano Del Corro,et al.  ClausIE: clause-based open information extraction , 2013, WWW.

[14]  Heng Ji,et al.  Open Relation Extraction and Grounding , 2017, IJCNLP.

[15]  Christopher De Sa,et al.  Data Programming: Creating Large Training Sets, Quickly , 2016, NIPS.

[16]  Yu Zhang,et al.  Highway long short-term memory RNNS for distant speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[17]  Graham Neubig,et al.  Improving Open Information Extraction via Iterative Rank-Aware Learning , 2019, ACL.

[18]  William Yang Wang,et al.  Robust Distant Supervision Relation Extraction via Deep Reinforcement Learning , 2018, ACL.

[19]  Oren Etzioni,et al.  Open Information Extraction from the Web , 2007, CACM.

[20]  Amina Kadry,et al.  Open Relation Extraction for Support Passage Retrieval: Merit and Open Issues , 2017, SIGIR.

[21]  Daniel Jurafsky,et al.  Distant supervision for relation extraction without labeled data , 2009, ACL.

[22]  Sheng Zhang,et al.  MT/IE: Cross-lingual Open Information Extraction with Neural Sequence-to-Sequence Models , 2017, EACL.

[23]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[24]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[25]  Lijun Wu,et al.  A Study of Reinforcement Learning for Neural Machine Translation , 2018, EMNLP.

[26]  Christopher D. Manning,et al.  Leveraging Linguistic Structure For Open Domain Information Extraction , 2015, ACL.

[27]  William Yang Wang,et al.  DSGAN: Generative Adversarial Training for Distant Supervision Relation Extraction , 2018, ACL.

[28]  Ido Dagan,et al.  Supervised Open Information Extraction , 2018, NAACL.

[29]  Roberto Navigli,et al.  Integrating Syntactic and Semantic Analysis into the Open Information Extraction Paradigm , 2013, IJCAI.

[30]  Xu Li,et al.  Logician and Orator: Learning from the Duality between Language and Knowledge in Open Domain , 2018, EMNLP.

[31]  Peter W. Glynn,et al.  Likelihood ratio gradient estimation for stochastic systems , 1990, CACM.

[32]  Tianyang Zhang,et al.  A Hierarchical Framework for Relation Extraction with Reinforcement Learning , 2018, AAAI.

[33]  Christopher Potts,et al.  A large annotated corpus for learning natural language inference , 2015, EMNLP.

[34]  Alexander Löser,et al.  KrakeN: N-ary Facts in Open Information Extraction , 2012, AKBC-WEKEX@NAACL-HLT.

[35]  Jürgen Schmidhuber,et al.  Training Very Deep Networks , 2015, NIPS.

[36]  Luke S. Zettlemoyer,et al.  Question-Answer Driven Semantic Role Labeling: Using Natural Language to Annotate Natural Language , 2015, EMNLP.

[37]  Ming Zhou,et al.  Neural Open Information Extraction , 2018, ACL.

[38]  Daniel S. Weld,et al.  Open Information Extraction Using Wikipedia , 2010, ACL.