Unsupervised Domain Adaptation for Joint Segmentation and POS-Tagging

We report an empirical investigation on type-supervised domain adaptation for joint Chinese word segmentation and POS-tagging, making use of domainspecific tag dictionaries and only unlabeled target domain data to improve target-domain accuracies, given a set of annotated source domain sentences. Previous work on POS-tagging of other languages showed that type-supervision can be a competitive alternative to tokensupervision, while semi-supervised techniques such as label propagation are important to the effectiveness of typesupervision. We report similar findings using a novel approach for joint Chinese segmentation and POS-tagging, under a cross-domain setting. With the help of unlabeled sentences and a lexicon of 3,000 words, we obtain 33% error reduction in target-domain tagging. In addition, combined type- and token-supervision can lead to improved cost-effectiveness.

[1]  Qiang Yang,et al.  Transferring Naive Bayes Classifiers for Text Classification , 2007, AAAI.

[2]  Anna Margolis,et al.  A Literature Review of Domain Adaptation with Unlabeled Data , 2011 .

[3]  Eugene Charniak,et al.  Effective Self-Training for Parsing , 2006, NAACL.

[4]  Weiwei Sun,et al.  A Stacked Sub-Word Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging , 2011, ACL.

[5]  Jason Baldridge,et al.  Real-World Semi-Supervised Learning of POS-Taggers for Low-Resource Languages , 2013, ACL.

[6]  Christopher D. Manning Part-of-Speech Tagging from 97% to 100%: Is It Time for Some Linguistics? , 2011, CICLing.

[7]  Joakim Nivre,et al.  Token and Type Constraints for Cross-Lingual Part-of-Speech Tagging , 2013, TACL.

[8]  Rajat Raina,et al.  Self-taught learning: transfer learning from unlabeled data , 2007, ICML '07.

[9]  John Blitzer,et al.  Domain Adaptation with Structural Correspondence Learning , 2006, EMNLP.

[10]  Slav Petrov,et al.  A Universal Part-of-Speech Tagset , 2011, LREC.

[11]  Jason Baldridge,et al.  Learning a Part-of-Speech Tagger from Two Hours of Annotation , 2013, NAACL.

[12]  James R. Curran,et al.  Bootstrapping POS-taggers using unlabelled data , 2003, CoNLL.

[13]  Stephen Clark,et al.  Syntactic Processing Using the Generalized Perceptron and Beam Search , 2011, CL.

[14]  Koby Crammer,et al.  New Regularized Algorithms for Transductive Learning , 2009, ECML/PKDD.

[15]  Slav Petrov,et al.  Efficient Graph-Based Semi-Supervised Learning of Structured Tagging Models , 2010, EMNLP.

[16]  Hwee Tou Ng,et al.  Chinese Part-of-Speech Tagging: One-at-a-Time or All-at-Once? Word-Based or Character-Based? , 2004, EMNLP.

[17]  Lei Shi,et al.  Cross Language Text Classification by Model Translation and Semi-Supervised Learning , 2010, EMNLP.

[18]  Nianwen Xue,et al.  The Bracketing Guidelines for the Chinese Treebank , 2000 .

[19]  Daniel Marcu,et al.  Domain Adaptation for Statistical Classifiers , 2006, J. Artif. Intell. Res..

[20]  Stephen Clark,et al.  A Fast Decoder for Joint Word Segmentation and POS-Tagging Using a Single Discriminative Model , 2010, EMNLP.

[21]  Qun Liu,et al.  A Cascaded Linear Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging , 2008, ACL.

[22]  Isabel Trancoso,et al.  Graph-based Semi-Supervised Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging , 2013, ACL.

[23]  Anders Søgaard Data point selection for cross-language adaptation of dependency parsers , 2011, ACL.

[24]  Scott Miller,et al.  Name Tagging with Word Clusters and Discriminative Training , 2004, NAACL.

[25]  Hal Daumé,et al.  Frustratingly Easy Domain Adaptation , 2007, ACL.

[26]  Yoshimasa Tsuruoka,et al.  Improving Chinese Word Segmentation and POS Tagging with Semi-supervised Methods Using Large Auto-Analyzed Data , 2011, IJCNLP.

[27]  Xavier Carreras,et al.  Simple Semi-supervised Dependency Parsing , 2008, ACL.

[28]  Mark Steedman,et al.  Two Decades of Unsupervised POS Induction: How Far Have We Come? , 2010, EMNLP.

[29]  Jun'ichi Tsujii,et al.  Incremental Joint Approach to Word Segmentation, POS Tagging, and Dependency Parsing in Chinese , 2012, ACL.

[30]  Kenji Sagae Self-Training without Reranking for Parser Domain Adaptation and Its Impact on Semantic Role Labeling , 2010 .

[31]  Mark Steedman,et al.  Bootstrapping statistical parsers from small datasets , 2003, EACL.

[32]  Kevin Knight,et al.  Minimized Models for Unsupervised Part-of-Speech Tagging , 2009, ACL.

[33]  Hwee Tou Ng,et al.  A Maximum Entropy Approach to Chinese Word Segmentation , 2005, SIGHAN@IJCNLP 2005.

[34]  Hitoshi Isahara,et al.  An Error-Driven Word-Character Hybrid Model for Joint Chinese Word Segmentation and POS Tagging , 2009, ACL/IJCNLP.

[35]  Ines Rehbein Data point selection for self-training , 2011, SPMRL@IWPT.

[36]  Brian Roark,et al.  Supervised and unsupervised PCFG adaptation to novel domains , 2003, NAACL.

[37]  ChengXiang Zhai,et al.  Instance Weighting for Domain Adaptation in NLP , 2007, ACL.

[38]  Marco Saerens,et al.  Adjusting the Outputs of a Classifier to New a Priori Probabilities: A Simple Procedure , 2002, Neural Computation.

[39]  Jason Baldridge,et al.  Type-Supervised Hidden Markov Models for Part-of-Speech Tagging with Incomplete Tag Dictionaries , 2012, EMNLP.

[40]  Qiang Dong,et al.  Hownet And The Computation Of Meaning , 2006 .

[41]  Brian Roark,et al.  Incremental Parsing with the Perceptron Algorithm , 2004, ACL.

[42]  Qun Liu,et al.  Word Lattice Reranking for Chinese Word Segmentation and Part-of-Speech Tagging , 2008, COLING.

[43]  Yoav Goldberg,et al.  EM Can Find Pretty Good HMM POS-Taggers (When Given a Good Start) , 2008, ACL.

[44]  Stephen Clark,et al.  Joint Word Segmentation and POS Tagging Using a Single Perceptron , 2008, ACL.

[45]  James R. Curran,et al.  Chart Pruning for Fast Lexicalised-Grammar Parsing , 2010, COLING.

[46]  Percy Liang,et al.  Semi-Supervised Learning for Natural Language , 2005 .

[47]  Thomas L. Griffiths,et al.  A fully Bayesian approach to unsupervised part-of-speech tagging , 2007, ACL.

[48]  Hongbo Xu,et al.  Adapting Naive Bayes to Domain Adaptation for Sentiment Analysis , 2009, ECIR.

[49]  Andrew McCallum,et al.  Chinese Segmentation and New Word Detection using Conditional Random Fields , 2004, COLING.

[50]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[51]  Bernard Mérialdo,et al.  Tagging English Text with a Probabilistic Model , 1994, CL.

[52]  Fei Xia,et al.  The Penn Chinese TreeBank: Phrase structure annotation of a large corpus , 2005, Natural Language Engineering.

[53]  Ari Rappoport,et al.  Self-Training for Enhancement and Domain Adaptation of Statistical Parsers Trained on Small Datasets , 2007, ACL.