An Advantage Actor-Critic Algorithm with Confidence Exploration for Open Information Extraction

Open Information Extraction (OIE) is a task of generating the structured representations of information from natural language sentences. Recently years, many works have trained an End-to-End OIE extractor based on Sequence-toSequence (Seq2Seq) model and applied Reinforce Algorithm to update the model. However, the model performance often suffers from a large training variance and limited exploration. This paper introduces a reinforcement learning framework that enables an Advantage Actor-Critic (AAC) algorithm to update the Seq2Seq model with samples from a novel Confidence Exploration (CE). The AAC algorithm reduces the training variance with a fine-grained evaluation of each individual word. The confidence exploration provides effective training samples by exploring the word at key positions. Empirical evaluations demonstrate the leading performance of our Advantage Actor-Critic algorithm and Confidence Exploration over other comparison methods.

[1]  Joelle Pineau,et al.  An Actor-Critic Algorithm for Sequence Prediction , 2016, ICLR.

[2]  Mausam,et al.  Open Information Extraction Systems and Downstream Applications , 2016, IJCAI.

[3]  Xu Li,et al.  Extracting Knowledge from Web Text with Monte Carlo Tree Search , 2020, WWW.

[4]  André Freitas,et al.  A Survey on Open Information Extraction , 2018, COLING.

[5]  Marc'Aurelio Ranzato,et al.  Sequence Level Training with Recurrent Neural Networks , 2015, ICLR.

[6]  Xu Li,et al.  Logician and Orator: Learning from the Duality between Language and Knowledge in Open Domain , 2018, EMNLP.

[7]  Oren Etzioni,et al.  Open question answering over curated and extracted knowledge bases , 2014, KDD.

[8]  Ido Dagan,et al.  Supervised Open Information Extraction , 2018, NAACL.

[9]  Yang Xiang,et al.  Chinese Open Relation Extraction and Knowledge Base Establishment , 2018, ACM Trans. Asian Low Resour. Lang. Inf. Process..

[10]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[11]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[12]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[13]  Oren Etzioni,et al.  Chinese Open Relation Extraction for Knowledge Acquisition , 2014, EACL.

[14]  Hang Li,et al.  “ Tony ” DNN Embedding for “ Tony ” Selective Read for “ Tony ” ( a ) Attention-based Encoder-Decoder ( RNNSearch ) ( c ) State Update s 4 SourceVocabulary Softmax Prob , 2016 .

[15]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[16]  Lantao Yu,et al.  SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient , 2016, AAAI.

[17]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[18]  Miao Fan,et al.  Logician: A Unified End-to-End Neural Approach for Open-Domain Information Extraction , 2018, WSDM.

[19]  Oren Etzioni,et al.  Open Language Learning for Information Extraction , 2012, EMNLP.

[20]  Yang Liu,et al.  Modeling Coverage for Neural Machine Translation , 2016, ACL.

[21]  Ping Li,et al.  Multi-Agent Discussion Mechanism for Natural Language Generation , 2019, AAAI.

[22]  Benjamin Van Roy,et al.  Deep Exploration via Bootstrapped DQN , 2016, NIPS.