Rank-Aware Negative Training for Semi-Supervised Text Classification

Abstract Semi-supervised text classification-based paradigms (SSTC) typically employ the spirit of self-training. The key idea is to train a deep classifier on limited labeled texts and then iteratively predict the unlabeled texts as their pseudo-labels for further training. However, the performance is largely affected by the accuracy of pseudo-labels, which may not be significant in real-world scenarios. This paper presents a Rank-aware Negative Training (RNT) framework to address SSTC in learning with noisy label settings. To alleviate the noisy information, we adapt a reasoning with uncertainty-based approach to rank the unlabeled texts based on the evidential support received from the labeled texts. Moreover, we propose the use of negative training to train RNT based on the concept that “the input instance does not belong to the complementary label”. A complementary label is randomly selected from all labels except the label on-target. Intuitively, the probability of a true label serving as a complementary label is low and thus provides less noisy information during the training, resulting in better performance on the test data. Finally, we evaluate the proposed solution on various text classification benchmark datasets. Our extensive experiments show that it consistently overcomes the state-of-the-art alternatives in most scenarios and achieves competitive performance in the others. The code of RNT is publicly available on GitHub.

[1]  Hyunjung Shim,et al.  Learning from Better Supervision: Self-distillation for Learning with Noisy Labels , 2022, International Conference on Pattern Recognition.

[2]  L. Fu,et al.  Contrast-Enhanced Semi-supervised Text Classification with Few Labels , 2022, AAAI.

[3]  Yongyi Mao,et al.  Dual Contrastive Learning: Text Classification via Label-Aware Data Augmentation , 2022, ArXiv.

[4]  Xuanjing Huang,et al.  SENT: Sentence-level Distant Relation Extraction via Negative Training , 2021, ACL.

[5]  Negin Karisani,et al.  Semi-Supervised Text Classification via Self-Pretraining , 2021, WSDM.

[6]  Zenglin Xu,et al.  A Survey on Deep Semi-Supervised Learning , 2021, IEEE Transactions on Knowledge and Data Engineering.

[7]  Samy Bengio,et al.  Understanding deep learning (still) requires rethinking generalization , 2021, Commun. ACM.

[8]  Yanyan Wang,et al.  Aspect-level sentiment analysis based on gradual machine learning , 2020, Knowl. Based Syst..

[9]  M. Zaheer,et al.  Big Bird: Transformers for Longer Sequences , 2020, NeurIPS.

[10]  Subhabrata Mukherjee,et al.  Uncertainty-aware Self-training for Few-shot Text Classification , 2020, NeurIPS.

[11]  Diyi Yang,et al.  MixText: Linguistically-Informed Interpolation of Hidden Space for Semi-Supervised Text Classification , 2020, ACL.

[12]  Dian Yu,et al.  CLUE: A Chinese Language Understanding Evaluation Benchmark , 2020, COLING.

[13]  Wanxiang Che,et al.  Revisiting Pre-Trained Models for Chinese Natural Language Processing , 2020, FINDINGS.

[14]  Zhanhuai Li,et al.  Constructing domain-dependent sentiment dictionary for sentiment analysis , 2020, Neural Computing and Applications.

[15]  Junnan Li,et al.  DivideMix: Learning with Noisy Labels as Semi-supervised Learning , 2020, ICLR.

[16]  Jesper E. van Engelen,et al.  A survey on semi-supervised learning , 2019, Machine Learning.

[17]  Omer Levy,et al.  BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , 2019, ACL.

[18]  Junmo Kim,et al.  NLNL: Negative Learning for Noisy Labels , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[19]  Noel E. O'Connor,et al.  Pseudo-Labeling and Confirmation Bias in Deep Semi-Supervised Learning , 2019, 2020 International Joint Conference on Neural Networks (IJCNN).

[20]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[21]  Kevin Gimpel,et al.  Variational Sequential Labelers for Semi-Supervised Learning , 2019, EMNLP.

[22]  Wanxiang Che,et al.  Pre-Training With Whole Word Masking for Chinese BERT , 2019, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[23]  Wanxiang Che,et al.  Pre-Training with Whole Word Masking for Chinese BERT , 2019, ArXiv.

[24]  Yiming Yang,et al.  XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[25]  Noah A. Smith,et al.  Variational Pretraining for Semi-supervised Text Classification , 2019, ACL.

[26]  Jeff A. Bilmes,et al.  Combating Label Noise in Deep Learning Using Abstention , 2019, ICML.

[27]  Quoc V. Le,et al.  Unsupervised Data Augmentation for Consistency Training , 2019, NeurIPS.

[28]  Tao Jiang,et al.  Attentional Encoder Network for Targeted Sentiment Classification , 2019, ICANN.

[29]  Christoph H. Lampert,et al.  Robust Learning from Untrusted Sources , 2019, ICML.

[30]  Xingrui Yu,et al.  How does Disagreement Help Generalization against Label Corruption? , 2019, ICML.

[31]  Yanyao Shen,et al.  Learning with Bad Training Data via Iterative Trimmed Loss Minimization , 2018, ICML.

[32]  Artem Komarichev,et al.  Deep Semi-Supervised Learning , 2018, 2018 24th International Conference on Pattern Recognition (ICPR).

[33]  James Bailey,et al.  Dimensionality-Driven Learning with Noisy Labels , 2018, ICML.

[34]  Quan Pan,et al.  Classifier Fusion With Contextual Reliability Evaluation , 2018, IEEE Transactions on Cybernetics.

[35]  Masashi Sugiyama,et al.  Co-teaching: Robust training of deep neural networks with extremely noisy labels , 2018, NeurIPS.

[36]  Zhanhuai Li,et al.  r-HUMO: A Risk-Aware Human-Machine Cooperation Framework for Entity Resolution with Quality Guarantees , 2018, IEEE Transactions on Knowledge and Data Engineering.

[37]  Xing Ji,et al.  CosFace: Large Margin Cosine Loss for Deep Face Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[38]  Lei Zhang,et al.  CleanNet: Transfer Learning for Scalable Image Classifier Training with Label Noise , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[39]  Yonatan Belinkov,et al.  Synthetic and Natural Noise Both Break Neural Machine Translation , 2017, ICLR.

[40]  Lidong Bing,et al.  Recurrent Attention Network on Memory for Aspect Sentiment Analysis , 2017, EMNLP.

[41]  Houfeng Wang,et al.  Interactive Attention Networks for Aspect-Level Sentiment Classification , 2017, IJCAI.

[42]  Tong Zhang,et al.  Deep Pyramid Convolutional Neural Networks for Text Categorization , 2017, ACL.

[43]  Arash Vahdat,et al.  Toward Robustness against Label Noise in Training Deep Discriminative Neural Networks , 2017, NIPS.

[44]  Shin Ishii,et al.  Virtual Adversarial Training: A Regularization Method for Supervised and Semi-Supervised Learning , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[45]  Zhiting Hu,et al.  Improved Variational Autoencoders for Text Modeling using Dilated Convolutions , 2017, ICML.

[46]  Abhinav Gupta,et al.  Learning from Noisy Large-Scale Datasets with Minimal Supervision , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Li Zhao,et al.  Attention-based LSTM for Aspect-level Sentiment Classification , 2016, EMNLP.

[48]  Richard Nock,et al.  Making Deep Neural Networks Robust to Label Noise: A Loss Correction Approach , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Yann LeCun,et al.  Very Deep Convolutional Networks for Text Classification , 2016, EACL.

[50]  Andrew M. Dai,et al.  Adversarial Training Methods for Semi-Supervised Text Classification , 2016, ICLR.

[51]  Kim Schouten,et al.  Survey on Aspect-Level Sentiment Analysis , 2016, IEEE Transactions on Knowledge and Data Engineering.

[52]  Xiang Zhang,et al.  Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[53]  Xiaogang Wang,et al.  Learning from massive noisy labeled data for image classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Dong-Ling Xu,et al.  Evidential reasoning rule for evidence combination , 2013, Artif. Intell..

[55]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[56]  Andrew Y. Ng,et al.  Parsing with Compositional Vector Grammars , 2013, ACL.

[57]  Bing Liu,et al.  Sentiment Analysis and Opinion Mining , 2012, Synthesis Lectures on Human Language Technologies.

[58]  Xiaoyong Du,et al.  Partially Supervised Text Classification with Multi-Level Examples , 2011, AAAI.

[59]  See-Kiong Ng,et al.  Negative Training Data Can be Harmful to Text Classification , 2010, EMNLP.

[60]  Philip S. Yu,et al.  A holistic lexicon-based approach to opinion mining , 2008, WSDM '08.

[61]  Bo Pang,et al.  Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales , 2005, ACL.

[62]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[63]  Dan Roth,et al.  Learning Question Classifiers , 2002, COLING.

[64]  Philip S. Yu,et al.  Partially Supervised Classification of Text Documents , 2002, ICML.

[65]  J. Schilperoord,et al.  Linguistics , 1999 .

[66]  Nick Cercone,et al.  Computational Linguistics , 1986, Communications in Computer and Information Science.

[67]  Scott Johnson,et al.  Labels , 1902, The Canadian Entomologist.

[68]  Jianlin Su,et al.  BERT-ASC: Auxiliary-Sentence Construction for Implicit Aspect Learning in Sentiment Analysis , 2022, ArXiv.

[69]  Jihong Ouyang,et al.  Semi-Supervised Text Classification with Balanced Deep Representation Distributions , 2021, ACL.

[70]  Zhanhuai Li,et al.  DNN-driven Gradual Machine Learning for Aspect-term Sentiment Analysis , 2021, FINDINGS.

[71]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[72]  Haoming Jiang,et al.  Contextual Text Denoising with Masked Language Model , 2019, EMNLP.

[73]  Ameya Pitale Examples , 2019, Siegel Modular Forms.

[74]  Fabio Massimo Zanzotto,et al.  Synthesis Lectures on Human Language Technologies , 2013 .

[75]  Dong-Hyun Lee,et al.  Pseudo-Label : The Simple and Efficient Semi-Supervised Learning Method for Deep Neural Networks , 2013 .

[76]  Ming-Wei Chang,et al.  Importance of Semantic Representation: Dataless Classification , 2008, AAAI.

[77]  D. Schmidtz,et al.  Guarantees , 1997, Social Philosophy and Policy.