GILE: A Generalized Input-Label Embedding for Text Classification

Neural text classification models typically treat output labels as categorical variables that lack description and semantics. This forces their parametrization to be dependent on the label set size, and, hence, they are unable to scale to large label sets and generalize to unseen ones. Existing joint input-label text models overcome these issues by exploiting label descriptions, but they are unable to capture complex label relationships, have rigid parametrization, and their gains on unseen labels happen often at the expense of weak performance on the labels seen during training. In this paper, we propose a new input-label model that generalizes over previous such models, addresses their limitations, and does not compromise performance on seen labels. The model consists of a joint nonlinear input-label embedding with controllable capacity and a joint-space-dependent classification unit that is trained with cross-entropy loss to optimize classification performance. We evaluate models on full-resource and low- or zero-resource text classification of multilingual news and biomedical text with a large label set. Our model outperforms monolingual and multilingual models that do not leverage label semantics and previous joint input-label space models in both scenarios.

[1]  Gökhan Tür,et al.  Zero-Shot Learning and Clustering for Semantic Utterance Classification , 2013, ICLR.

[2]  Yelong Shen,et al.  End-to-end Learning of LDA by Mirror-Descent Back Propagation over a Deep Architecture , 2015, NIPS.

[3]  James Henderson,et al.  A Model of Zero-Shot Learning of Spoken Language Understanding , 2015, EMNLP.

[4]  Andrei Popescu-Belis,et al.  Multilingual Hierarchical Attention Networks for Document Classification , 2017, IJCNLP.

[5]  Andrew McCallum,et al.  Structured Prediction Energy Networks , 2015, ICML.

[6]  Gabriela Csurka,et al.  Metric Learning for Large Scale Image Classification: Generalizing to New Classes at Near-Zero Cost , 2012, ECCV.

[7]  Khalil Mrini,et al.  Cross-lingual Transfer for News Article Labeling: Benchmarking Statistical and Neural Models , 2017 .

[8]  Lior Wolf,et al.  Using the Output Embedding to Improve Language Models , 2016, EACL.

[9]  Johannes Fürnkranz,et al.  All-in Text: Learning Document, Label, and Word Representations Jointly , 2016, AAAI.

[10]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[11]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[12]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[13]  Jason Weston,et al.  A Neural Attention Model for Abstractive Sentence Summarization , 2015, EMNLP.

[14]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[15]  Yu-Chiang Frank Wang,et al.  Learning Deep Latent Spaces for Multi-Label Classification , 2017, ArXiv.

[16]  Yaser Al-Onaizan,et al.  Zero-Resource Translation with Multi-Lingual Neural Machine Translation , 2016, EMNLP.

[17]  Isabelle Augenstein,et al.  Multi-Task Learning of Pairwise Sequence Classification Tasks over Disparate Label Spaces , 2018, NAACL.

[18]  Noah A. Smith,et al.  Neural Discourse Structure for Text Categorization , 2017, ACL.

[19]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[20]  Mubarak Shah,et al.  Fast Zero-Shot Image Tagging , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Jason Weston,et al.  WSABIE: Scaling Up to Large Vocabulary Image Annotation , 2011, IJCAI.

[22]  Jason Weston,et al.  Large scale image annotation: learning to rank with joint word-image embeddings , 2010, Machine Learning.

[23]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[24]  Samy Bengio,et al.  Zero-Shot Learning by Convex Combination of Semantic Embeddings , 2013, ICLR.

[25]  Richard Socher,et al.  Ask Me Anything: Dynamic Memory Networks for Natural Language Processing , 2015, ICML.

[26]  Ting Liu,et al.  Document Modeling with Gated Recurrent Neural Network for Sentiment Classification , 2015, EMNLP.

[27]  Lei Shu,et al.  DOC: Deep Open Classification of Text Documents , 2017, EMNLP.

[28]  Diyi Yang,et al.  Hierarchical Attention Networks for Document Classification , 2016, NAACL.

[29]  Jason Lee,et al.  Fully Character-Level Neural Machine Translation without Explicit Segmentation , 2016, TACL.

[30]  Guillaume Lample,et al.  Massively Multilingual Word Embeddings , 2016, ArXiv.

[31]  Yoshua Bengio,et al.  Hierarchical Probabilistic Neural Network Language Model , 2005, AISTATS.

[32]  Ivan Titov,et al.  Inducing Crosslingual Distributed Representations of Words , 2012, COLING.

[33]  Hakan Inan,et al.  Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling , 2016, ICLR.

[34]  Ming Zhou,et al.  Hierarchical Recurrent Neural Network for Document Modeling , 2015, EMNLP.

[35]  Andrew Y. Ng,et al.  Zero-Shot Learning Through Cross-Modal Transfer , 2013, NIPS.

[36]  Marc'Aurelio Ranzato,et al.  DeViSE: A Deep Visual-Semantic Embedding Model , 2013, NIPS.

[37]  Jun Zhao,et al.  Recurrent Convolutional Neural Networks for Text Classification , 2015, AAAI.

[38]  Shaogang Gong,et al.  Recent Advances in Zero-Shot Recognition: Toward Data-Efficient Understanding of Visual Content , 2018, IEEE Signal Processing Magazine.

[39]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[40]  Xiang Zhang,et al.  Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[41]  Bo Pang,et al.  Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales , 2005, ACL.

[42]  James Henderson,et al.  Beyond Weight Tying: Learning Joint Input-Output Embeddings for Neural Machine Translation , 2018, WMT.

[43]  Christopher D. Manning,et al.  Learning Distributed Representations for Structured Output Prediction , 2014, NIPS.

[44]  Guoyin Wang,et al.  Joint Embedding of Words and Labels for Text Classification , 2018, ACL.

[45]  Tong Zhang,et al.  Effective Use of Word Order for Text Categorization with Convolutional Neural Networks , 2014, NAACL.