Self-Training with Weak Supervision

State-of-the-art deep neural networks require large-scale labeled training data that is often expensive to obtain or not available for many tasks. Weak supervision in the form of domain-specific rules has been shown to be useful in such settings to automatically generate weakly labeled training data. However, learning with weak rules is challenging due to their inherent heuristic and noisy nature. An additional challenge is rule coverage and overlap, where prior work on weak supervision only considers instances that are covered by weak rules, thus leaving valuable unlabeled data behind. In this work, we develop a weak supervision framework (ASTRA) that leverages all the available data for a given task. To this end, we leverage task-specific unlabeled data through self-training with a model (student) that considers contextualized representations and predicts pseudo-labels for instances that may not be covered by weak rules. We further develop a rule attention network (teacher) that learns how to aggregate student pseudo-labels with weak rule labels, conditioned on their fidelity and the underlying context of an instance. Finally, we construct a semi-supervised learning objective for end-to-end training with unlabeled data, domain-specific rules, and a small amount of labeled data. Extensive experiments on six benchmark datasets for text classification demonstrate the effectiveness of our approach with significant improvements over state-of-the-art baselines.

[1]  Bernt Schiele,et al.  Learning to Self-Train for Semi-Supervised Few-Shot Classification , 2019, NeurIPS.

[2]  Rayid Ghani,et al.  Analyzing the effectiveness and applicability of co-training , 2000, CIKM '00.

[3]  Jingbo Shang,et al.  Contextualized Weak Supervision for Text Classification , 2020, ACL.

[4]  Yoshua Bengio,et al.  Semi-supervised Learning by Entropy Minimization , 2004, CAP.

[5]  Jean-Pierre Lorré,et al.  Data Programming for Learning Discourse Structure , 2019, ACL.

[6]  Bin Yang,et al.  Learning to Reweight Examples for Robust Deep Learning , 2018, ICML.

[7]  Susan T. Dumais,et al.  Meta Label Correction for Noisy Label Learning , 2019, AAAI.

[8]  Richard Nock,et al.  Making Deep Neural Networks Robust to Label Noise: A Loss Correction Approach , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Christopher Ré,et al.  Snorkel: Rapid Training Data Creation with Weak Supervision , 2017, Proc. VLDB Endow..

[10]  Kevin Gimpel,et al.  Using Trusted Data to Train Deep Networks on Labels Corrupted by Severe Noise , 2018, NeurIPS.

[11]  Heike Adel,et al.  A Survey on Recent Approaches for Natural Language Processing in Low-Resource Scenarios , 2020, NAACL.

[12]  Jiawei Han,et al.  Weakly-Supervised Neural Text Classification , 2018, CIKM.

[13]  David Yarowsky,et al.  Unsupervised Word Sense Disambiguation Rivaling Supervised Methods , 1995, ACL.

[14]  Luis Gravano,et al.  Leveraging Just a Few Keywords for Fine-Grained Aspect Detection Through Weakly Supervised Co-Training , 2019, EMNLP.

[15]  Daniel Jurafsky,et al.  Distant supervision for relation extraction without labeled data , 2009, ACL.

[16]  Tom M. Mitchell,et al.  Estimating Accuracy from Unlabeled Data: A Probabilistic Logic Approach , 2017, NIPS.

[17]  Frederic Sala,et al.  Training Complex Models with Multi-Task Weak Supervision , 2018, AAAI.

[18]  Stefano Ermon,et al.  Learning with Weak Supervision from Physics and Data-Driven Constraints , 2018, AI Mag..

[19]  M. Verleysen,et al.  Classification in the Presence of Label Noise: A Survey , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[20]  Kalina Bontcheva,et al.  Stance Detection with Bidirectional Conditional Encoding , 2016, EMNLP.

[21]  Ben Taskar,et al.  Posterior Regularization for Structured Latent Variable Models , 2010, J. Mach. Learn. Res..

[22]  Quoc V. Le,et al.  Rethinking Pre-training and Self-training , 2020, NeurIPS.

[23]  Subhabrata Mukherjee,et al.  Uncertainty-aware Self-training for Few-shot Text Classification , 2020, NeurIPS.

[24]  Eric P. Xing,et al.  Harnessing Deep Neural Networks with Logic Rules , 2016, ACL.

[25]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[26]  Dong-Hyun Lee,et al.  Pseudo-Label : The Simple and Efficient Semi-Supervised Learning Method for Deep Neural Networks , 2013 .

[27]  Quoc V. Le,et al.  Self-Training With Noisy Student Improves ImageNet Classification , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Sunita Sarawagi,et al.  Learning from Rules Generalizing Labeled Exemplars , 2020, ICLR.

[29]  Christopher Ré,et al.  Snorkel DryBell: A Case Study in Deploying Weak Supervision at Industrial Scale , 2018, SIGMOD Conference.

[30]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[31]  Ellen Riloff,et al.  Automatically Generating Extraction Patterns from Untagged Text , 1996, AAAI/IAAI, Vol. 2.

[32]  Jiajun Zhang,et al.  Exploiting Source-side Monolingual Data in Neural Machine Translation , 2016, EMNLP.

[33]  Ahmed Hassan Awadallah,et al.  Adaptive Self-training for Few-shot Neural Sequence Labeling , 2020, ArXiv.

[34]  Tom M. Mitchell,et al.  Learning Pipelines with Limited Data and Domain Knowledge: A Study in Parsing Physics Problems , 2018, NeurIPS.

[35]  Jiajun Shen,et al.  Revisiting Self-Training for Neural Sequence Generation , 2020, ICLR.

[36]  Le Zhao,et al.  Filling Knowledge Base Gaps for Distant Supervision of Relation Extraction , 2013, ACL.

[37]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[38]  Qi Xie,et al.  Meta-Weight-Net: Learning an Explicit Mapping For Sample Weighting , 2019, NeurIPS.