Partial Or Complete, That’s The Question

For many structured learning tasks, the data annotation process is complex and costly. Existing annotation schemes usually aim at acquiring completely annotated structures, under the common perception that partial structures are of low quality and could hurt the learning process. This paper questions this common perception, motivated by the fact that structures consist of interdependent sets of variables. Thus, given a fixed budget, partly annotating each structure may provide the same level of supervision, while allowing for more structures to be annotated. We provide an information theoretic formulation for this perspective and use it, in the context of three diverse structured learning tasks, to show that learning from partial structures can sometimes outperform learning from complete ones. Our findings may provide important insights into structured data annotation schemes and could support progress in learning protocols for structured tasks.

[1]  Sabine Buchholz,et al.  Introduction to the CoNLL-2000 Shared Task Chunking , 2000, CoNLL/LLL.

[2]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[3]  Anima Anandkumar,et al.  Active Learning with Partial Feedback , 2018, ICLR.

[4]  Peter Winkler,et al.  Counting linear extensions is #P-complete , 1991, STOC '91.

[5]  Dan Roth,et al.  Incidental Supervision: Moving beyond Supervised Learning , 2017, AAAI.

[6]  Dan Roth,et al.  The Use of Classifiers in Sequential Inference , 2001, NIPS.

[7]  Martha Palmer,et al.  From TreeBank to PropBank , 2002, LREC.

[8]  Naftali Tishby,et al.  Opening the Black Box of Deep Neural Networks via Information , 2017, ArXiv.

[9]  Dan Roth,et al.  Learning and Inference over Constrained Output , 2005, IJCAI.

[10]  Ohad Shamir,et al.  Learning and generalization with the information bottleneck , 2008, Theor. Comput. Sci..

[11]  David Yarowsky,et al.  Unsupervised Word Sense Disambiguation Rivaling Supervised Methods , 1995, ACL.

[12]  Ido Dagan,et al.  Crowdsourcing Question-Answer Meaning Representations , 2017, NAACL.

[13]  Dan Roth,et al.  Margin-Based Active Learning for Structured Output Spaces , 2006, ECML.

[14]  Graham Neubig,et al.  Word-based Partial Annotation for Efficient Corpus Construction , 2010, LREC.

[15]  Fred A. Hamprecht,et al.  Structured Learning from Partial Annotations , 2012, ICML.

[16]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[17]  H. J. Scudder,et al.  Probability of error of some adaptive pattern-recognition machines , 1965, IEEE Trans. Inf. Theory.

[18]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[19]  Dan Roth,et al.  Exploiting Partially Annotated Data for Temporal Relation Extraction , 2018, *SEM@NAACL-HLT.

[20]  Hinrich Schütze,et al.  Active Learning with Amazon Mechanical Turk , 2011, EMNLP.

[21]  James Pustejovsky,et al.  SemEval-2013 Task 1: TempEval-3: Evaluating Time Expressions, Events, and Temporal Relations , 2013, *SEMEVAL.

[22]  Yuji Matsumoto,et al.  Training Conditional Random Fields Using Incomplete Annotations , 2008, COLING.

[23]  William A. Gale,et al.  A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[24]  Hao Wu,et al.  A Multi-Axis Annotation Scheme for Event Temporal Relations , 2018, ACL.

[25]  Dana Angluin,et al.  Queries and concept learning , 1988, Machine Learning.

[26]  Stephen D. Mayhew,et al.  CogCompNLP: Your Swiss Army Knife for NLP , 2018, LREC.

[27]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[28]  Ben Taskar,et al.  Posterior Regularization for Structured Latent Variable Models , 2010, J. Mach. Learn. Res..

[29]  Jaime G. Carbonell,et al.  Active Learning and Crowd-Sourcing for Machine Translation , 2010, LREC.

[30]  José Carlos Príncipe,et al.  Understanding Autoencoders with Information Theoretic Concepts , 2018, Neural Networks.

[31]  Dan Roth,et al.  CogCompTime: A Tool for Understanding Time in Natural Language , 2018, EMNLP.

[32]  Xavier Carreras,et al.  Introduction to the CoNLL-2005 Shared Task: Semantic Role Labeling , 2005, CoNLL.

[33]  Luke S. Zettlemoyer,et al.  Question-Answer Driven Semantic Role Labeling: Using Natural Language to Annotate Natural Language , 2015, EMNLP.

[34]  Hao Wu,et al.  Joint Reasoning for Temporal and Causal Relations , 2018, ACL.

[35]  Ming-Wei Chang,et al.  Guiding Semi-Supervision with Constraint-Driven Learning , 2007, ACL.

[36]  Dirk Hovy,et al.  Exploiting Partial Annotations with EM Training , 2012, HLT-NAACL 2012.

[37]  Dan Roth,et al.  Active Learning for Pipeline Models , 2008, AAAI.

[38]  D. Roth,et al.  Active Learning with Perceptron for Structured Output , 2006 .

[39]  Eraldo Rezende Fernandes,et al.  Learning from Partially Annotated Sequences , 2011, ECML/PKDD.

[40]  David A. Cohn,et al.  Training Connectionist Networks with Queries and Selective Sampling , 1989, NIPS.

[41]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[42]  David D. Cox,et al.  On the information bottleneck theory of deep learning , 2018, ICLR.

[43]  Jonghyun Choi,et al.  Structured Set Matching Networks for One-Shot Part Labeling , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.