Coupled Sequence Labeling on Heterogeneous Annotations: POS Tagging as a Case Study

In order to effectively utilize multiple datasets with heterogeneous annotations, this paper proposes a coupled sequence labeling model that can directly learn and infer two heterogeneous annotations simultaneously, and to facilitate discussion we use Chinese part-ofspeech (POS) tagging as our case study. The key idea is to bundle two sets of POS tags together (e.g. “[NN, n]”), and build a conditional random field (CRF) based tagging model in the enlarged space of bundled tags with the help of ambiguous labelings. To train our model on two non-overlapping datasets that each has only one-side tags, we transform a one-side tag into a set of bundled tags by considering all possible mappings at the missing side and derive an objective function based on ambiguous labelings. The key advantage of our coupled model is to provide us with the flexibility of 1) incorporating joint features on the bundled tags to implicitly learn the loose mapping between heterogeneous annotations, and 2) exploring separate features on one-side tags to overcome the data sparseness problem of using only bundled tags. Experiments on benchmark datasets show that our coupled model significantly outperforms the state-ofthe-art baselines on both one-side POS tagging and annotation conversion tasks. The codes and newly annotated data are released for non-commercial usage.1 ∗Correspondence author. http://hlt.suda.edu.cn/ zhli

[1]  Weiwei Sun,et al.  Capturing Paradigmatic and Syntagmatic Lexical Relations: Towards Accurate Chinese Part-of-Speech Tagging , 2012, ACL.

[2]  Scott M. Smith,et al.  Computer Intensive Methods for Testing Hypotheses: An Introduction , 1989 .

[3]  Ulrich Endriss,et al.  Empirical Analysis of Aggregation Methods for Collective Annotation , 2014, COLING.

[4]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[5]  Min Zhang,et al.  Soft Cross-lingual Syntax Projection for Dependency Parsing , 2014, COLING.

[6]  Mark Johnson,et al.  Parsing the Wall Street Journal using a Lexical-Functional Grammar and Discriminative Estimation Techniques , 2002, ACL.

[7]  Qun Liu,et al.  Automatic Adaptation of Annotation Standards: Chinese Word Segmentation and POS Tagging - A Case Study , 2009, ACL/IJCNLP.

[8]  Fan Yang,et al.  Semi-Supervised Chinese Word Segmentation Using Partial-Label Learning With Conditional Random Fields , 2014, EMNLP.

[9]  Xuanjing Huang,et al.  Joint Segmentation and Tagging with Coupled Sequences Labeling , 2012, COLING.

[10]  Anders Søgaard,et al.  Semi-supervised condensed nearest neighbor for part-of-speech tagging , 2011, ACL.

[11]  Xuanjing Huang,et al.  Joint Chinese Word Segmentation and POS Tagging on Heterogeneous Annotated Corpora with Multiple Task Learning , 2013, EMNLP.

[12]  Weiwei Sun,et al.  Reducing Approximation and Estimation Errors for Chinese Lexical Processing with Heterogeneous Annotations , 2012, ACL.

[13]  F. Xia,et al.  The Part-Of-Speech Tagging Guidelines for the Penn Chinese Treebank (3.0) , 2000 .

[14]  Andrew McCallum,et al.  Dynamic conditional random fields: factorized probabilistic models for labeling and segmenting sequence data , 2004, J. Mach. Learn. Res..

[15]  Stephen Clark,et al.  Joint Word Segmentation and POS Tagging Using a Single Perceptron , 2008, ACL.

[16]  Wanxiang Che,et al.  A Separately Passive-Aggressive Training Algorithm for Joint POS Tagging and Dependency Parsing , 2012, COLING.

[17]  Kilian Q. Weinberger,et al.  Large Margin Multi-Task Metric Learning , 2010, NIPS.

[18]  Yue Zhang,et al.  Domain Adaptation for CRF-based Chinese Word Segmentation using Free Annotations , 2014, EMNLP.

[19]  Fei Xia,et al.  The Penn Chinese TreeBank: Phrase structure annotation of a large corpus , 2005, Natural Language Engineering.

[20]  Xu Sun,et al.  Latent Variable Perceptron Algorithm for Structured Classification , 2009, IJCAI.

[21]  Joakim Nivre,et al.  Target Language Adaptation of Discriminative Transfer Parsers , 2013, NAACL.

[22]  Mary P. Harper,et al.  Improving A Simple Bigram HMM Part-of-Speech Tagger by Latent Annotation and Self-Training , 2009, NAACL.

[23]  Koby Crammer,et al.  Sequence Learning from Data with Multiple Labels , 2009 .

[24]  Tong Zhang,et al.  A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , 2005, J. Mach. Learn. Res..

[25]  Min Zhang,et al.  Ambiguity-aware Ensemble Training for Semi-supervised Dependency Parsing , 2014, ACL.

[26]  Wanxiang Che,et al.  Jointly or Separately: Which is Better for Parsing Heterogeneous Dependencies? , 2014, COLING.

[27]  Joakim Nivre,et al.  Integrating Graph-Based and Transition-Based Dependency Parsers , 2008, ACL.

[28]  Rong Jin,et al.  Learning with Multiple Labels , 2002, NIPS.

[29]  Qun Liu,et al.  Discriminative Learning with Natural Annotations: Word Segmentation as a Case Study , 2013, ACL.

[30]  Shai Ben-David,et al.  Exploiting Task Relatedness for Mulitple Task Learning , 2003, COLT.

[31]  Russell V. Lenth,et al.  Computer Intensive Methods for Testing Hypotheses: An Introduction , 1990 .