Estimation from Indirect Supervision with Linear Moments

In structured prediction problems where we have indirect supervision of the output, maximum marginal likelihood faces two computational obstacles: non-convexity of the objective and intractability of even a single gradient computation. In this paper, we bypass both obstacles for a class of what we call linear indirectly-supervised problems. Our approach is simple: we solve a linear system to estimate sufficient statistics of the model, which we then use to estimate parameters via convex optimization. We analyze the statistical properties of our approach and show empirically that it is effective in two settings: learning with local privacy constraints and learning from low-cost count-based annotations.

[1]  K. Pearson Contributions to the Mathematical Theory of Evolution , 1894 .

[2]  S L Warner,et al.  Randomized response: a survey technique for eliminating evasive answer bias. , 1965, Journal of the American Statistical Association.

[3]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[4]  A. Tamhane Randomized Response Techniques for Multiple Sensitive Attributes , 1981 .

[5]  N. Matloff Use of covariates in randomized response settings , 1984 .

[6]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[7]  Tomás Lozano-Pérez,et al.  A Framework for Multiple-Instance Learning , 1997, NIPS.

[8]  A. V. D. Vaart Asymptotic Statistics: Delta Method , 1998 .

[9]  J. Robins,et al.  Inference for imputation estimators , 2000 .

[10]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[11]  Trevor Darrell,et al.  Conditional Random Fields for Object Recognition , 2004, NIPS.

[12]  Alexandre V. Evfimievski,et al.  Privacy preserving mining of association rules , 2002, Inf. Syst..

[13]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[14]  Dan Klein,et al.  Prototype-Driven Learning for Sequence Models , 2006, NAACL.

[15]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[16]  Ben Taskar,et al.  Expectation Maximization and Posterior Constraints , 2007, NIPS.

[17]  Ming-Wei Chang,et al.  Guiding Semi-Supervision with Constraint-Driven Learning , 2007, ACL.

[18]  Gideon S. Mann,et al.  Generalized Expectation Criteria for Semi-Supervised Learning of Conditional Random Fields , 2008, ACL.

[19]  Sofya Raskhodnikova,et al.  What Can We Learn Privately? , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[20]  Quoc V. Le,et al.  Estimating labels from label proportions , 2008, International Conference on Machine Learning.

[21]  Serge J. Belongie,et al.  Visual tracking with online Multiple Instance Learning , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Dan Klein,et al.  Learning from measurements in exponential families , 2009, ICML '09.

[23]  Dan Klein,et al.  Learning Dependency-Based Compositional Semantics , 2011, CL.

[24]  Sham M. Kakade,et al.  Identifiability and Unmixing of Latent Parse Trees , 2012, NIPS.

[25]  Anima Anandkumar,et al.  Two SVDs Suffice: Spectral decompositions for probabilistic topic modeling and latent Dirichlet allocation , 2012, NIPS 2012.

[26]  Martin J. Wainwright,et al.  Local privacy and statistical minimax rates , 2013, 2013 51st Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[27]  Sham M. Kakade,et al.  Learning mixtures of spherical gaussians: moment methods and spectral decompositions , 2012, ITCS '13.

[28]  David Sontag,et al.  Unsupervised Learning of Noisy-Or Bayesian Networks , 2013, UAI.

[29]  Percy Liang,et al.  Spectral Experts for Estimating Mixtures of Linear Regressions , 2013, ICML.

[30]  Anima Anandkumar,et al.  Tensor decompositions for learning latent variable models , 2012, J. Mach. Learn. Res..

[31]  Michael S. Bernstein,et al.  Twitch crowdsourcing: crowd contributions in short bursts of time , 2014, CHI.

[32]  Percy Liang,et al.  Estimating Latent-Variable Graphical Models using Moments and Likelihoods , 2014, ICML.

[33]  Sida I. Wang,et al.  Estimating Mixture Models via Mixtures of Polynomials , 2015, NIPS.

[34]  Anima Anandkumar,et al.  Tensor Decompositions for Learning Latent Variable Models (A Survey for ALT) , 2015, ALT.

[35]  Percy Liang,et al.  Learning with Relaxed Supervision , 2015, NIPS.

[36]  Dean Alderucci A SPECTRAL ALGORITHM FOR LEARNING HIDDEN MARKOV MODELS THAT HAVE SILENT STATES , 2015 .

[37]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2016, J. Priv. Confidentiality.