Fast and Three-rious: Speeding Up Weak Supervision with Triplet Methods

Weak supervision is a popular method for building machine learning models without relying on ground truth annotations. Instead, it generates probabilistic training labels by estimating the accuracies of multiple noisy labeling sources (e.g., heuristics, crowd workers). Existing approaches use latent variable estimation to model the noisy sources, but these methods can be computationally expensive, scaling superlinearly in the data. In this work, we show that, for a class of latent variable models highly applicable to weak supervision, we can find a closed-form solution to model parameters, obviating the need for iterative solutions like stochastic gradient descent (SGD). We use this insight to build FlyingSquid, a weak supervision framework that runs orders of magnitude faster than previous weak supervision approaches and requires fewer assumptions. In particular, we prove bounds on generalization error without assuming that the latent variable model can exactly parameterize the underlying data distribution. Empirically, we validate FlyingSquid on benchmark weak supervision datasets and find that it achieves the same or higher quality compared to previous approaches without the need to tune an SGD procedure, recovers model parameters 170 times faster on average, and enables new video analysis and online learning applications.

[1]  Shiying Luo,et al.  Weakly Supervised Sequence Tagging from Noisy Rules , 2020, AAAI.

[2]  Xiaogang Wang,et al.  Learning from massive noisy labeled data for image classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[4]  M-Dyaa Albakour,et al.  What do a Million News Articles Look like? , 2016, NewsIR@ECIR.

[5]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[6]  Christopher De Sa,et al.  DeepDive: Declarative Knowledge Base Construction , 2016, SGMD.

[7]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[8]  Venkat Chandrasekaran,et al.  Complexity of Inference in Graphical Models , 2008, UAI.

[9]  Xingyu Zhou On the Fenchel Duality between Strong Convexity and Lipschitz Continuous Gradient , 2018, 1803.06573.

[10]  Deva Ramanan,et al.  Online Model Distillation for Efficient Video Inference , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[11]  Christopher Ré,et al.  DeepDive: Web-scale Knowledge-base Construction using Statistical Learning and Inference , 2012, VLDS.

[12]  Anima Anandkumar,et al.  Learning From Noisy Singly-labeled Data , 2017, ICLR.

[13]  Aaron D. Shaw,et al.  Designing incentives for inexpert human raters , 2011, CSCW.

[14]  Jason Eisner,et al.  Modeling Annotators: A Generative Approach to Learning from Annotator Rationales , 2008, EMNLP.

[15]  I JordanMichael,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008 .

[16]  Frederic Sala,et al.  Multi-Resolution Weak Supervision for Sequential Data , 2019, NeurIPS.

[17]  Geoffrey E. Hinton,et al.  Who Said What: Modeling Individual Labelers Improves Classification , 2017, AAAI.

[18]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[19]  Zhipeng Jia,et al.  Constrained Deep Weak Supervision for Histopathology Image Segmentation , 2017, IEEE Transactions on Medical Imaging.

[20]  Sandeep Tata,et al.  Migrating a Privacy-Safe Information Extraction System to a Software 2.0 Design , 2020, CIDR.

[21]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[22]  Yisong Yue,et al.  Generating Multi-Agent Trajectories using Programmatic Weak Supervision , 2018, ICLR.

[23]  Maneesh Agrawala,et al.  Rekall: Specifying Video Events using Compositions of Spatiotemporal Labels , 2019, ArXiv.

[24]  Jaap Kamps,et al.  Learning to Learn from Weak Supervision by Full Supervision , 2017, ArXiv.

[25]  F. Bunea,et al.  On the sample covariance matrix estimator of reduced effective rank population matrices, with applications to fPCA , 2012, 1212.5321.

[26]  Hiroshi Nakagawa,et al.  Reducing Wrong Labels in Distant Supervision for Relation Extraction , 2012, ACL.

[27]  Mark Craven,et al.  Constructing Biological Knowledge Bases by Extracting Information from Text Sources , 1999, ISMB.

[28]  Tiago A. Almeida,et al.  TubeSpam: Comment Spam Filtering on YouTube , 2015, 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA).

[29]  Christopher Ré,et al.  Snorkel DryBell: A Case Study in Deploying Weak Supervision at Industrial Scale , 2018, SIGMOD Conference.

[30]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[31]  Dan Klein,et al.  Learning from measurements in exponential families , 2009, ICML '09.

[32]  Eric Gilbert,et al.  Comparing Person- and Process-centric Strategies for Obtaining Quality Data on Amazon Mechanical Turk , 2015, CHI.

[33]  Christopher Ré,et al.  Learning the Structure of Generative Models without Labeled Data , 2017, ICML.

[34]  Anima Anandkumar,et al.  Tensor decompositions for learning latent variable models , 2012, J. Mach. Learn. Res..

[35]  Bernard Ghanem,et al.  ActivityNet: A large-scale video benchmark for human activity understanding , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Daniel Jurafsky,et al.  Distant supervision for relation extraction without labeled data , 2009, ACL.

[37]  Christopher D. Manning,et al.  Improved Pattern Learning for Bootstrapped Entity Extraction , 2014, CoNLL.

[38]  W. Bruce Croft,et al.  Neural Ranking Models with Weak Supervision , 2017, SIGIR.

[39]  Cyrus Rashtchian,et al.  Collecting Image Annotations Using Amazon’s Mechanical Turk , 2010, Mturk@HLT-NAACL.

[40]  Christopher Ré,et al.  Overton: A Data System for Monitoring and Improving Machine-Learned Products , 2019, CIDR.

[41]  Christopher De Sa,et al.  Data Programming: Creating Large Training Sets, Quickly , 2016, NIPS.

[42]  Percy Liang,et al.  Estimating Latent-Variable Graphical Models using Moments and Likelihoods , 2014, ICML.

[43]  Frederic Sala,et al.  Training Complex Models with Multi-Task Weak Supervision , 2018, AAAI.

[44]  David A. Forsyth,et al.  Utility data annotation with Amazon Mechanical Turk , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[45]  Luke S. Zettlemoyer,et al.  Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations , 2011, ACL.

[46]  Aniket Kittur,et al.  Crowdsourcing user studies with Mechanical Turk , 2008, CHI.

[47]  Michael I. Jordan Graphical Models , 2003 .

[48]  Shai Shalev-Shwartz,et al.  Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..

[49]  Aditya G. Parameswaran,et al.  Evaluating the crowd with confidence , 2013, KDD.

[50]  Lorrie Faith Cranor,et al.  Are your participants gaming the system?: screening mechanical turk workers , 2010, CHI.

[51]  Christopher Ré,et al.  Snorkel: Rapid Training Data Creation with Weak Supervision , 2017, Proc. VLDB Endow..

[52]  A. P. Dawid,et al.  Maximum Likelihood Estimation of Observer Error‐Rates Using the EM Algorithm , 1979 .

[53]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[54]  Philip M. Long The Complexity of Learning According to Two Models of a Drifting Environment , 2004, Machine Learning.

[55]  Aditi Raghunathan,et al.  Estimation from Indirect Supervision with Linear Moments , 2016, ICML.

[56]  Bin Bi,et al.  Iterative Learning for Reliable Crowdsourcing Systems , 2012 .

[57]  Gideon S. Mann,et al.  Generalized Expectation Criteria for Semi-Supervised Learning with Weakly Labeled Data , 2010, J. Mach. Learn. Res..

[58]  Kaiming He,et al.  Exploring the Limits of Weakly Supervised Pretraining , 2018, ECCV.