Coupled POS Tagging on Heterogeneous Annotations

The limited scale and genre coverage of labeled data greatly hinders the effectiveness of supervised models, especially when analyzing spoken languages, such as texts transcribed from speech and informal text including tweets and product comments in Internet. In order to effectively utilize multiple labeled datasets with heterogeneous annotations for the same task, this paper proposes a coupled sequence labeling model that can directly learn and infer two heterogeneous annotations simultaneously, using Chinese part-of-speech (POS) tagging as our case study. The key idea is to bundle two sets of POS tags together (e.g., “$[{NN},{n}$ ]”), and build a conditional random field (CRF) based tagging model in the enlarged space of bundled tags with the help of ambiguous labeling. To train our model on two nonoverlapping datasets that each has only one-side tags, we transform a one-side tag into a set of bundled tags by concatenating the tag with every possible tag at the missing side according to a predefined context-free tag-to-tag mapping function, thus producing ambiguous labeling as weak supervision. We design and investigate four different context-free tag-to-tag mapping functions, and find out that the coupled model achieves its best performance when each one-side tag is mapped to all tags at the other side (namely complete mapping), indicating that the model can effectively learn the loose mapping between the two heterogeneous annotations, without the need of manually designed mapping rules. Moreover, we propose a context-aware online pruning strategy that can more accurately capture mapping relationships between annotations based on contextual evidences and thus effectively solve the severe inefficiency problem with our coupled model under complete mapping, making it comparable with the baseline CRF model. Experiments on benchmark datasets show that our coupled model significantly outperforms the state-of-the-art baselines on both one-side POS tagging and annotation conversion tasks. The codes and newly annotated data are released for research usage.11 http://hlt.suda.edu.cn/∼zhli.

[1]  Ruhi Sarikaya,et al.  An Empirical Investigation of Word Class-Based Features for Natural Language Understanding , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[2]  Qun Liu,et al.  Discriminative Learning with Natural Annotations: Word Segmentation as a Case Study , 2013, ACL.

[3]  Kilian Q. Weinberger,et al.  Large Margin Multi-Task Metric Learning , 2010, NIPS.

[4]  Xuanjing Huang,et al.  Joint Chinese Word Segmentation and POS Tagging on Heterogeneous Annotated Corpora with Multiple Task Learning , 2013, EMNLP.

[5]  Tong Zhang,et al.  A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , 2005, J. Mach. Learn. Res..

[6]  F. Xia,et al.  The Part-Of-Speech Tagging Guidelines for the Penn Chinese Treebank (3.0) , 2000 .

[7]  Min Zhang,et al.  Coupled Sequence Labeling on Heterogeneous Annotations: POS Tagging as a Case Study , 2015, ACL.

[8]  Andrew McCallum,et al.  Dynamic conditional random fields: factorized probabilistic models for labeling and segmenting sequence data , 2004, J. Mach. Learn. Res..

[9]  Young-Bum Kim,et al.  New Transfer Learning Techniques for Disparate Label Sets , 2015, ACL.

[10]  Georg Heigold,et al.  Equivalence of Generative and Log-Linear Models , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Hal Daumé,et al.  Frustratingly Easy Domain Adaptation , 2007, ACL.

[12]  Yue Zhang,et al.  Character-Level Chinese Dependency Parsing , 2014, ACL.

[13]  Wanxiang Che,et al.  A Separately Passive-Aggressive Training Algorithm for Joint POS Tagging and Dependency Parsing , 2012, COLING.

[14]  Wanxiang Che,et al.  Joint Optimization for Chinese POS Tagging and Dependency Parsing , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[15]  Weiwei Sun,et al.  Capturing Paradigmatic and Syntagmatic Lexical Relations: Towards Accurate Chinese Part-of-Speech Tagging , 2012, ACL.

[16]  Joakim Nivre,et al.  Target Language Adaptation of Discriminative Transfer Parsers , 2013, NAACL.

[17]  Min Zhang,et al.  Ambiguity-aware Ensemble Training for Semi-supervised Dependency Parsing , 2014, ACL.

[18]  Wanxiang Che,et al.  Jointly or Separately: Which is Better for Parsing Heterogeneous Dependencies? , 2014, COLING.

[19]  Joakim Nivre,et al.  Integrating Graph-Based and Transition-Based Dependency Parsers , 2008, ACL.

[20]  Weiwei Sun,et al.  Reducing Approximation and Estimation Errors for Chinese Lexical Processing with Heterogeneous Annotations , 2012, ACL.

[21]  Yassine Benajiba,et al.  Arabic Named Entity Recognition: A Feature-Driven Study , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[22]  Yue Zhang,et al.  Distributed Feature Representations for Dependency Parsing , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[23]  Yue Zhang,et al.  Domain Adaptation for CRF-based Chinese Word Segmentation using Free Annotations , 2014, EMNLP.

[24]  K. J. Evans,et al.  Computer Intensive Methods for Testing Hypotheses: An Introduction , 1990 .

[25]  Mary P. Harper,et al.  Improving A Simple Bigram HMM Part-of-Speech Tagger by Latent Annotation and Self-Training , 2009, NAACL.

[26]  M. A. R T A P A L,et al.  The Penn Chinese TreeBank: Phrase structure annotation of a large corpus , 2005, Natural Language Engineering.

[27]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[28]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[29]  Adwait Ratnaparkhi,et al.  A Maximum Entropy Model for Part-Of-Speech Tagging , 1996, EMNLP.

[30]  Rong Jin,et al.  Learning with Multiple Labels , 2002, NIPS.

[31]  Yassine Benajiba,et al.  Aligned-Parallel-Corpora Based Semi-Supervised Learning for Arabic Mention Detection , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[32]  Geoffrey Zweig,et al.  Using Recurrent Neural Networks for Slot Filling in Spoken Language Understanding , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[33]  Hitoshi Isahara,et al.  Morphological analysis of the corpus of spontaneous Japanese , 2004, IEEE Transactions on Speech and Audio Processing.

[34]  Anders Søgaard,et al.  Semi-supervised condensed nearest neighbor for part-of-speech tagging , 2011, ACL.

[35]  Hiroyuki Shindo,et al.  Transition-Based Dependency Parsing Exploiting Supertags , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[36]  Shiwen Yu,et al.  Specification for Corpus Processing at Peking University: Word Segmentation, POS Tagging and Phonetic Notation , 2003, J. Chin. Lang. Comput..

[37]  Gary Geunbae Lee,et al.  Triangular-Chain Conditional Random Fields , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[38]  Shai Ben-David,et al.  Exploiting Task Relatedness for Mulitple Task Learning , 2003, COLT.

[39]  Min Zhang,et al.  Soft Cross-lingual Syntax Projection for Dependency Parsing , 2014, COLING.

[40]  Mark Johnson,et al.  Parsing the Wall Street Journal using a Lexical-Functional Grammar and Discriminative Estimation Techniques , 2002, ACL.

[41]  Qun Liu,et al.  Automatic Adaptation of Annotation Standards: Chinese Word Segmentation and POS Tagging - A Case Study , 2009, ACL/IJCNLP.

[42]  Fan Yang,et al.  Semi-Supervised Chinese Word Segmentation Using Partial-Label Learning With Conditional Random Fields , 2014, EMNLP.

[43]  Xuanjing Huang,et al.  Joint Segmentation and Tagging with Coupled Sequences Labeling , 2012, COLING.

[44]  Koby Crammer,et al.  Sequence Learning from Data with Multiple Labels , 2009 .