Deriving Machine Attention from Human Rationales

Attention-based models are successful when trained on large amounts of data. In this paper, we demonstrate that even in the low-resource scenario, attention can be learned effectively. To this end, we start with discrete human-annotated rationales and map them into continuous attention. Our central hypothesis is that this mapping is general across domains, and thus can be transferred from resource-rich domains to low-resource ones. Our model jointly learns a domain-invariant representation and induces the desired mapping between rationales and attention. Our empirical results validate this hypothesis and show that our approach delivers significant gains over state-of-the-art baselines, yielding over 15% average error reduction on benchmark datasets.

[1]  François Laviolette,et al.  Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[2]  Diyi Yang,et al.  Hierarchical Attention Networks for Document Classification , 2016, NAACL.

[3]  Yoshua Bengio,et al.  Domain Adaptation for Large-Scale Sentiment Classification: A Deep Learning Approach , 2011, ICML.

[4]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[5]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[6]  Ben Poole,et al.  Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[7]  Jason Weston,et al.  A Neural Attention Model for Abstractive Sentence Summarization , 2015, EMNLP.

[8]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[9]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[10]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[11]  Holger Schwenk,et al.  Supervised Learning of Universal Sentence Representations from Natural Language Inference Data , 2017, EMNLP.

[12]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[13]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[14]  Bowen Zhou,et al.  A Structured Self-attentive Sentence Embedding , 2017, ICLR.

[15]  Yue Lu,et al.  Latent aspect rating analysis on review text data: a rating regression approach , 2010, KDD.

[16]  Jason Weston,et al.  Reading Wikipedia to Answer Open-Domain Questions , 2017, ACL.

[17]  Regina Barzilay,et al.  Aspect-augmented Adversarial Networks for Domain Adaptation , 2017, TACL.

[18]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[19]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[20]  J. Leskovec,et al.  Learning Attitudes and Attributes from Online Reviews , 2012 .

[21]  Xiangji Huang,et al.  Bi-Transferring Deep Neural Networks for Domain Adaptation , 2016, ACL.

[22]  Kilian Q. Weinberger,et al.  Marginalized Denoising Autoencoders for Domain Adaptation , 2012, ICML.

[23]  Lemao Liu,et al.  Neural Machine Translation with Supervised Attention , 2016, COLING.

[24]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[25]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[26]  Christine D. Piatko,et al.  Using “Annotator Rationales” to Improve Machine Learning for Text Categorization , 2007, NAACL.

[27]  Ye Zhang,et al.  Rationale-Augmented Convolutional Neural Networks for Text Classification , 2016, EMNLP.

[28]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[29]  Jun Zhao,et al.  Exploiting Argument Information to Improve Event Detection via Supervised Attention Mechanisms , 2017, ACL.

[30]  Lukás Burget,et al.  Extensions of recurrent neural network language model , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[31]  Léon Bottou,et al.  Wasserstein GAN , 2017, ArXiv.

[32]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[33]  Masaaki Nagata,et al.  Supervised Attention for Sequence-to-Sequence Constituency Parsing , 2017, IJCNLP.

[34]  Regina Barzilay,et al.  Rationalizing Neural Predictions , 2016, EMNLP.