Metric Sentiment Learning for Label Representation

Label representation aims to generate a so-called verbalizer to an input text, which has a broad application in the field of text classification, event detection, question answering, etc. Previous works on label representation, especially in a few-shot setting, mainly define the verbalizers manually, which is accurate but time-consuming. Other models fail to correctly produce antonymous verbalizers for two semantically opposite classes. Thus, in this paper, we propose a metric sentiment learning framework (MSeLF) to generate the verbalizers automatically, which can capture the sentiment differences between the verbalizers accurately. In detail, MSeLF consists of two major components, i.e., the contrastive mapping learning (CML) module and the equal-gradient verbalizer acquisition (EVA) module. CML learns a transformation matrix to project the initial word embeddings to the antonym-aware embeddings by enlarging the distance between the antonyms. After that, in the antonym-aware embedding space, EVA first takes a pair of antonymous words as verbalizers for two opposite classes and then applies a sentiment transition vector to generate verbalizers for intermediate classes. We use the generated verbalizers for the downstream text classification task in a few-shot setting on two publicly available fine-grained datasets. The results indicate that our proposal outperforms the state-of-the-art baselines in terms of accuracy. In addition, we find CML can be used as a flexible plug-in component in other verbalizer acquisition approaches.

[1]  Wanyu Chen,et al.  Taxonomy-aware Learning for Few-Shot Event Detection , 2021, WWW.

[2]  Felix Naumann,et al.  Knowledge Transfer for Entity Resolution with Siamese Neural Networks , 2021, ACM J. Data Inf. Qual..

[3]  Chao Wu,et al.  Attention-guided aggregation stereo matching network , 2020, Image Vis. Comput..

[4]  Jian Liu,et al.  Event Extraction as Machine Reading Comprehension , 2020, EMNLP.

[5]  Helmut Schmid,et al.  Automatically Identifying Words That Can Serve as Labels for Few-Shot Text Classification , 2020, COLING.

[6]  Yuanyuan Qiao,et al.  Siamese Neural Networks for User Identity Linkage Through Web Browsing , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[7]  Jianming Zheng,et al.  Incorporating Scenario Knowledge into A Unified Fine-tuning Architecture for Event Representation , 2020, SIGIR.

[8]  Wenpeng Yin,et al.  Meta-learning for Few-shot Natural Language Processing: A Survey , 2020, ArXiv.

[9]  Claire Cardie,et al.  Event Extraction by Answering (Almost) Natural Questions , 2020, EMNLP.

[10]  Ivan P. Yamshchikov,et al.  Synonyms and Antonyms: Embedded Conflict , 2020, ArXiv.

[11]  Jin Liu,et al.  Meta-Learning based prototype-relation network for few-shot classification , 2020, Neurocomputing.

[12]  Jimmy J. Lin,et al.  Document Ranking with a Pretrained Sequence-to-Sequence Model , 2020, FINDINGS.

[13]  Timo Schick,et al.  Exploiting Cloze-Questions for Few-Shot Text Classification and Natural Language Inference , 2020, EACL.

[14]  Frank F. Xu,et al.  How Can We Know What Language Models Know? , 2019, Transactions of the Association for Computational Linguistics.

[15]  Peter J. Liu,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[16]  M. de Rijke,et al.  Pre-train, Interact, Fine-tune: A Novel Interaction Representation for Text Classification , 2019, Inf. Process. Manag..

[17]  Sebastian Riedel,et al.  Language Models as Knowledge Bases? , 2019, EMNLP.

[18]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[19]  Wei Wang,et al.  Antonym-Synonym Classification Based on New Sub-space Embeddings , 2019, AAAI.

[20]  Sung Whan Yoon,et al.  TapNet: Neural Network Augmented with Task-Adaptive Projection for Few-Shot Learning , 2019, ICML.

[21]  Fei Cai,et al.  Hierarchical Neural Representation for Document Classification , 2019, Cognitive Computation.

[22]  Yong-Sheng Chen,et al.  Pyramid Stereo Matching Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[23]  Tao Xiang,et al.  Learning to Compare: Relation Network for Few-Shot Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[24]  Richard S. Zemel,et al.  Prototypical Networks for Few-shot Learning , 2017, NIPS.

[25]  Chi Zhang,et al.  Deep Manifold Learning of Symmetric Positive Definite Matrices with Application to Face Recognition , 2017, AAAI.

[26]  Ngoc Thang Vu,et al.  Distinguishing Antonyms and Synonyms in a Pattern-based Neural Network , 2017, EACL.

[27]  Roberto Navigli,et al.  Nasari: Integrating explicit knowledge and corpus statistics for a multilingual representation of concepts and entities , 2016, Artif. Intell..

[28]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[29]  Paulo E. Rauber,et al.  Visualizing Time-Dependent Data Using Dynamic t-SNE , 2016, EuroVis.

[30]  Ngoc Thang Vu,et al.  Integrating Distributional Lexical Contrast into Word Embeddings for Antonym-Synonym Distinction , 2016, ACL.

[31]  Greg Mori,et al.  Learning Structured Inference Neural Networks with Label Relations , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Angeliki Lazaridou,et al.  A Multitask Objective to Inject Lexical Contrast into Distributional Semantics , 2015, ACL.

[33]  Roy Schwartz,et al.  Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction , 2015, CoNLL.

[34]  Nir Ailon,et al.  Deep Metric Learning Using Triplet Network , 2014, SIMBAD.

[35]  Heike Adel,et al.  Using Mined Coreference Chains as a Resource for a Semantic Task , 2014, EMNLP.

[36]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[37]  Michael Roth,et al.  Combining Word Patterns and Discourse Markers for Paradigmatic Relation Classification , 2014, ACL.

[38]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[39]  Sabine Schulte im Walde,et al.  Uncovering Distributional Differences between Synonyms and Antonyms in a Word Space Model , 2013, IJCNLP.

[40]  J. Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[41]  Yee Whye Teh,et al.  A fast and simple algorithm for training neural probabilistic language models , 2012, ICML.

[42]  Peter D. Turney,et al.  Emotions Evoked by Common Words and Phrases: Using Mechanical Turk to Create an Emotion Lexicon , 2010, HLT-NAACL 2010.

[43]  Patrick Pantel,et al.  From Frequency to Meaning: Vector Space Models of Semantics , 2010, J. Artif. Intell. Res..

[44]  Buon Kiong Lau,et al.  Impact of Matching Network on Bandwidth of Compact Antenna Arrays , 2006, IEEE Transactions on Antennas and Propagation.

[45]  Lawrence K. Saul,et al.  Think Globally, Fit Locally: Unsupervised Learning of Low Dimensional Manifold , 2003, J. Mach. Learn. Res..

[46]  Ming Zhou,et al.  Identifying Synonyms among Distributionally Similar Words , 2003, IJCAI.

[47]  Timo Pukkala,et al.  A method for stochastic multiobjective optimization of stand management , 1997 .

[48]  Marina Angelovska,et al.  Siamese Neural Networks for Detecting Complementary Products , 2021, EACL.

[49]  Zhanbo Li,et al.  Research on Target Tracking Algorithm Based on Siamese Neural Network , 2021, Mob. Inf. Syst..

[50]  Alex Wang,et al.  Label Representations in Modeling Classification as Text Generation , 2020, AACL.

[51]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[52]  Gregory R. Koch,et al.  Siamese Neural Networks for One-Shot Image Recognition , 2015 .

[53]  Curriculum Vitae,et al.  Supervised , 2009, Encyclopedia of Biometrics.

[54]  Lawrence Cayton,et al.  Algorithms for manifold learning , 2005 .

[55]  G. Wahba,et al.  Multicategory Support Vector Machines , Theory , and Application to the Classification of Microarray Data and Satellite Radiance Data , 2004 .

[56]  G. Wahba,et al.  Multicategory Support Vector Machines, Theory, and Application to the Classification of Microarray Data and Satellite Radiance Data , 2002 .