Learning from Rules Generalizing Labeled Exemplars

In many applications labeled data is not readily available, and needs to be collected via pain-staking human supervision. We propose a rule-exemplar model for collecting human supervision to combine the scalability of rules with the quality of instance labels. The supervision is coupled such that it is both natural for humans and synergistic for learning. We propose a training algorithm that jointly denoises rules via latent coverage variables, and trains the model through a soft implication loss over the coverage and label variables. Empirical evaluation on five different tasks shows that (1) our algorithm is more accurate than several existing methods of learning from a mix of clean and noisy supervision, and (2) the coupled rule-exemplar supervision is effective in denoising rules.

[1]  Aritra Ghosh,et al.  Robust Loss Functions under Label Noise for Deep Neural Networks , 2017, AAAI.

[2]  Douglas E. Appelt,et al.  FASTUS: A Finite-state Processor for Information Extraction from Real-world Text , 1993, IJCAI.

[3]  Xingrui Yu,et al.  Co-teaching: Robust training of deep neural networks with extremely noisy labels , 2018, NeurIPS.

[4]  Ben Taskar,et al.  Posterior Regularization for Structured Latent Variable Models , 2010, J. Mach. Learn. Res..

[5]  Akebo Yamakami,et al.  Contributions to the study of SMS spam filtering: new collection and results , 2011, DocEng '11.

[6]  Abhinav Gupta,et al.  Learning from Noisy Large-Scale Datasets with Minimal Supervision , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Mert R. Sabuncu,et al.  Generalized Cross Entropy Loss for Training Deep Neural Networks with Noisy Labels , 2018, NeurIPS.

[8]  Jason Weston,et al.  Trading convexity for scalability , 2006, ICML.

[9]  Luc De Raedt,et al.  DeepProbLog: Neural Probabilistic Logic Programming , 2018, BNAIC/BENELEARN.

[10]  Kiyoharu Aizawa,et al.  Joint Optimization Framework for Learning with Noisy Labels , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[11]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[12]  Tiago A. Almeida,et al.  TubeSpam: Comment Spam Filtering on YouTube , 2015, 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA).

[13]  Jacob Goldberger,et al.  Training deep neural-networks using a noise adaptation layer , 2016, ICLR.

[14]  Chen Sun,et al.  Revisiting Unreasonable Effectiveness of Data in Deep Learning Era , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[15]  Lidong Bing,et al.  Semi-Supervised Learning with Declaratively Specified Entropy Constraints , 2018, NeurIPS.

[16]  Mislav Balunovic,et al.  DL2: Training and Querying Neural Networks with Logic , 2019, ICML.

[17]  Bin Yang,et al.  Learning to Reweight Examples for Robust Deep Learning , 2018, ICML.

[18]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[19]  Dan Roth,et al.  Learning Question Classifiers , 2002, COLING.

[20]  Eric P. Xing,et al.  Harnessing Deep Neural Networks with Logic Rules , 2016, ACL.

[21]  Anima Anandkumar,et al.  Learning From Noisy Singly-labeled Data , 2017, ICLR.

[22]  Yanyao Shen,et al.  Learning with Bad Training Data via Iterative Trimmed Loss Minimization , 2018, ICML.

[23]  Ian H. Witten,et al.  Generating Accurate Rule Sets Without Global Optimization , 1998, ICML.

[24]  Vineeth N. Balasubramanian,et al.  Adversarial Data Programming: Using GANs to Relax the Bottleneck of Curated Labeled Data , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[25]  Guy Van den Broeck,et al.  A Semantic Loss Function for Deep Learning with Symbolic Knowledge , 2017, ICML.

[26]  James R. Glass,et al.  Query understanding enhanced by hierarchical parsing structures , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[27]  Noah D. Goodman,et al.  The Language of Generalization , 2016, Psychological review.

[28]  Ivor W. Tsang,et al.  Masking: A New Perspective of Noisy Supervision , 2018, NeurIPS.

[29]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[30]  Stefano Ermon,et al.  Learning with Weak Supervision from Physics and Data-Driven Constraints , 2018, AI Mag..

[31]  Iryna Gurevych,et al.  Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) , 2018, ACL 2018.

[32]  Christopher Ré,et al.  Snorkel DryBell: A Case Study in Deploying Weak Supervision at Industrial Scale , 2018, SIGMOD Conference.

[33]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[34]  Abhinav Vishnu,et al.  Using Rule-Based Labels for Weak Supervised Learning: A ChemNet for Transferable Chemical Property Prediction , 2017, KDD.

[35]  Christopher De Sa,et al.  Data Programming: Creating Large Training Sets, Quickly , 2016, NIPS.