论文信息 - Bidirectional Context-Aware Hierarchical Attention Network for Document Understanding

Bidirectional Context-Aware Hierarchical Attention Network for Document Understanding

The Hierarchical Attention Network (HAN) has made great strides, but it suffers a major limitation: at level 1, each sentence is encoded in complete isolation. In this work, we propose and compare several modifications of HAN in which the sentence encoder is able to make context-aware attentional decisions (CAHAN). Furthermore, we propose a bidirectional document encoder that processes the document forwards and backwards, using the preceding and following sentences as context. Experiments on three large-scale sentiment and topic classification datasets show that the bidirectional version of CAHAN outperforms HAN everywhere, with only a modest increase in computation time. While results are promising, we expect the superiority of CAHAN to be even more evident on tasks requiring a deeper understanding of the input documents, such as abstractive summarization. Code is publicly available.

Michalis Vazirgiannis | Jean-Baptiste Remy | Antoine Jean-Pierre Tixier

[1] Christopher D. Manning,et al. Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[2] Ting Liu,et al. Document Modeling with Gated Recurrent Neural Network for Sentiment Classification , 2015, EMNLP.

[3] Yang Liu,et al. Modeling Coverage for Neural Machine Translation , 2016, ACL.

[4] Xiang Zhang,et al. Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[5] Yejin Choi,et al. Deep Communicating Agents for Abstractive Summarization , 2018, NAACL.

[6] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[7] Xing Wang,et al. Context-Aware Self-Attention Networks , 2019, AAAI.

[8] Leslie N. Smith,et al. A disciplined approach to neural network hyper-parameters: Part 1 - learning rate, batch size, momentum, and weight decay , 2018, ArXiv.

[9] Huanbo Luan,et al. Improving the Transformer Translation Model with Document-Level Context , 2018, EMNLP.

[10] Jean-Pierre Lorré,et al. Energy-based Self-attentive Learning of Abstractive Communities for Spoken Language Understanding , 2020, AACL/IJCNLP.

[11] Yang Liu,et al. Context Gates for Neural Machine Translation , 2016, TACL.