Jointly Learning to Label Sentences and Tokens

Learning to construct text representations in end-to-end systems can be difficult, as natural languages are highly compositional and task-specific annotated datasets are often limited in size. Methods for directly supervising language composition can allow us to guide the models based on existing knowledge, regularizing them towards more robust and interpretable representations. In this paper, we investigate how objectives at different granularities can be used to learn better language representations and we propose an architecture for jointly learning to label sentences and tokens. The predictions at each level are combined together using an attention mechanism, with token-level labels also acting as explicit supervision for composing sentence-level representations. Our experiments show that by learning to perform these tasks jointly on multiple levels, the model achieves substantial improvements for both sentence classification and sequence labeling.

[1]  Frank Keller,et al.  Exploring the Utility of Joint Morphological and Syntactic Learning from Child-directed Speech , 2013, EMNLP.

[2]  Guillaume Lample,et al.  Neural Architectures for Named Entity Recognition , 2016, NAACL.

[3]  Lemao Liu,et al.  Neural Machine Translation with Supervised Attention , 2016, COLING.

[4]  Yann Dauphin,et al.  Language Modeling with Gated Convolutional Networks , 2016, ICML.

[5]  János Csirik,et al.  The CoNLL-2010 Shared Task: Learning to Detect Hedges and their Scope in Natural Language Text , 2010, CoNLL Shared Task.

[6]  Wei Xu,et al.  Bidirectional LSTM-CRF Models for Sequence Tagging , 2015, ArXiv.

[7]  Diyi Yang,et al.  Hierarchical Attention Networks for Document Classification , 2016, NAACL.

[8]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[9]  Jason Eisner,et al.  Modeling Annotators: A Generative Approach to Learning from Annotator Rationales , 2008, EMNLP.

[10]  János Csirik,et al.  The BioScope corpus: biomedical texts annotated for uncertainty, negation and their scopes , 2008, BMC Bioinformatics.

[11]  Daniel Jurafsky,et al.  A Hierarchical Neural Autoencoder for Paragraphs and Documents , 2015, ACL.

[12]  Helen Yannakoudakis,et al.  Developing and testing a self-assessment and tutoring system , 2013, BEA@NAACL-HLT.

[13]  Barbara Plank,et al.  Multilingual Part-of-Speech Tagging with Bidirectional Long Short-Term Memory Models and Auxiliary Loss , 2016, ACL.

[14]  Quoc V. Le,et al.  Neural Programmer: Inducing Latent Programs with Gradient Descent , 2015, ICLR.

[15]  Maayan Harel,et al.  Learning from Multiple Outlooks , 2010, ICML.

[16]  Helen Yannakoudakis,et al.  Compositional Sequence Labeling Models for Error Detection in Learner Writing , 2016, ACL.

[17]  Rafael E. Banchs,et al.  A Report on the Automatic Evaluation of Scientific Writing Shared Task , 2016, BEA@NAACL-HLT.

[18]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[19]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[20]  Joachim Bingel,et al.  Sequence Classification with Human Attention , 2018, CoNLL.

[21]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[22]  Ye Zhang,et al.  Rationale-Augmented Convolutional Neural Networks for Text Classification , 2016, EMNLP.

[23]  Anders Søgaard,et al.  Zero-Shot Sequence Labeling: Transferring Knowledge from Sentences to Tokens , 2018, NAACL.

[24]  Noah A. Smith,et al.  Many Languages, One Parser , 2016, TACL.

[25]  Hung-yi Lee,et al.  Neural Attention Models for Sequence Classification: Analysis and Application to Key Term Extraction and Dialogue Act Detection , 2016, INTERSPEECH.

[26]  Raymond Hendy Susanto,et al.  The CoNLL-2014 Shared Task on Grammatical Error Correction , 2014 .

[27]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[28]  Alexander M. Rush,et al.  Character-Aware Neural Language Models , 2015, AAAI.

[29]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[30]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[31]  Benjamin Schrauwen,et al.  Training and analyzing deep recurrent neural networks , 2013, NIPS 2013.

[32]  Timothy Baldwin,et al.  Hierarchical Structured Model for Fine-to-Coarse Manifesto Text Analysis , 2018, NAACL.

[33]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[34]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[35]  Joel R. Tetreault,et al.  JFLEG: A Fluency Corpus and Benchmark for Grammatical Error Correction , 2017, EACL.

[36]  Mike Wells,et al.  Structured Models for Fine-to-Coarse Sentiment Analysis , 2007, ACL.

[37]  Helen Yannakoudakis,et al.  A New Dataset and Method for Automatically Grading ESOL Texts , 2011, ACL.

[38]  Ting Liu,et al.  Document Modeling with Gated Recurrent Neural Network for Sentiment Classification , 2015, EMNLP.

[39]  Mamoru Komachi,et al.  Grammatical Error Detection Using Error- and Grammaticality-Specific Word Embeddings , 2017, IJCNLP.

[40]  Chris Dyer,et al.  On the State of the Art of Evaluation in Neural Language Models , 2017, ICLR.

[41]  Yann Dauphin,et al.  Convolutional Sequence to Sequence Learning , 2017, ICML.

[42]  Marek Rei,et al.  Semi-supervised Multitask Learning for Sequence Labeling , 2017, ACL.