Dependency Parsing with Noisy Multi-annotation Data

In the past few years, performance of dependency parsing has been improved by large margin on closed-domain benchmark datasets. However, when processing real-life texts, parsing performance degrades dramatically. Besides the domain adaptation technique, which has made slow progress due to its intrinsic difficulty, one straightforward way is to annotate a certain scale of syntactic data given a new source of texts. However, it is well known that annotating data is time and effort consuming, especially for the complex syntactic annotation. Inspired by the progress in crowdsourcing, this paper proposes to annotate noisy multi-annotation syntactic data with non-experts annotators. Each sentence is independently annotated by multiple annotators and the inconsistencies are retained. In this way, we can annotate data very rapidly since we can recruit many ordinary annotators. Then we construct and release three multi-annotation datasets from different sources. Finally, we propose and compare several benchmark approaches to training dependency parsers on such multi-annotation data. We will release our code and data at http://hlt.suda.edu.cn/~zhli/.

[1]  Graham Neubig,et al.  Training Dependency Parsers from Partially Annotated Corpora , 2011, IJCNLP.

[2]  Brendan T. O'Connor,et al.  Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.

[3]  Bernardete Ribeiro,et al.  Sequence labeling with multiple annotators , 2013, Machine Learning.

[4]  Peng Jin,et al.  Multi-view Chinese Treebanking , 2014, COLING.

[5]  Gianluca Demartini,et al.  ZenCrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking , 2012, WWW.

[6]  Timothy Dozat,et al.  Deep Biaffine Attention for Neural Dependency Parsing , 2016, ICLR.

[7]  Matthew Lease,et al.  Aggregating and Predicting Sequence Labels from Crowd Annotations , 2017, ACL.

[8]  Gerardo Hermosillo,et al.  Learning From Crowds , 2010, J. Mach. Learn. Res..

[9]  Stephan Oepen,et al.  SemEval 2014 Task 8: Broad-Coverage Semantic Dependency Parsing , 2014, *SEMEVAL.

[10]  Sadao Kurohashi,et al.  Using Smaller Constituents Rather Than Sentences in Active Learning for Japanese Dependency Parsing , 2010, ACL.

[11]  Mihai Surdeanu,et al.  Event Extraction as Dependency Parsing , 2011, ACL.

[12]  A. P. Dawid,et al.  Maximum Likelihood Estimation of Observer Error‐Rates Using the EM Algorithm , 1979 .

[13]  Fei Xia,et al.  The Penn Chinese TreeBank: Phrase structure annotation of a large corpus , 2005, Natural Language Engineering.

[14]  Koby Crammer,et al.  Sequence Learning from Data with Multiple Labels , 2009 .

[15]  Yue Zhang,et al.  Active Learning for Dependency Parsing with Partial Annotation , 2016, ACL.

[16]  Margrit Betke,et al.  How to Collect Segmentations for Biomedical Images? A Benchmark Evaluating the Performance of Experts, Crowdsourced Non-experts, and Algorithms , 2015, 2015 IEEE Winter Conference on Applications of Computer Vision.

[17]  Panagiotis G. Ipeirotis,et al.  Get another label? improving data quality and data mining using multiple, noisy labelers , 2008, KDD.

[18]  Gerardo Hermosillo,et al.  Supervised learning from multiple experts: whom to trust when everyone lies a bit , 2009, ICML '09.

[19]  Luigi Di Caro,et al.  Sentiment analysis via dependency parsing , 2013, Comput. Stand. Interfaces.