A Common Case of Jekyll and Hyde: The Synergistic Effect of Using Divided Source Training Data for Feature Augmentation

Feature augmentation is a well-known method for domain adaptation and has been shown to be effective when tested on several NLP tasks (Daume III, 2007). However, a limitation of the method is that it requires labeled data from the target domain and very often such data is unavailable. In this paper, we propose to use training data selection to divide the source domain training data into two parts, pseudo target data (the selected part) and source data (the unselected part), and then apply feature augmentation on the two parts of the training data. This approach has two advantages: first, feature augmentation can be applied even when there is no labeled data from the target domain; second, the approach can take advantage of all the training data including the part that is not selected by training data selection. We evaluate the approach on Chinese word segmentation and part-of-speech tagging and show that it outperforms the baseline where no feature augmentation is applied.

[1]  Yorick Wilks,et al.  Unsupervised Learning of Word Boundary with Description Length Gain , 1999, CoNLL.

[2]  Alex Waibel,et al.  Low Cost Portability for Statistical Machine Translation based on N-gram Coverage , 2005 .

[3]  Yan Song,et al.  Using a Goodness Measurement for Domain Adaptation: A Case Study on Chinese Word Segmentation , 2012, LREC.

[4]  Chunyu Kit,et al.  Unsupervised lexical learning as inductive inference. , 2000 .

[5]  Hal Daumé,et al.  Frustratingly Easy Domain Adaptation , 2007, ACL.

[6]  Nianwen Xue,et al.  Developing Guidelines and Ensuring Consistency for Chinese Text Annotation , 2000, LREC.

[7]  Chunyu Kit,et al.  Unsupervised Lexical Learning As Inductive Inference via Compression , 2000 .

[8]  Dragos Stefan Munteanu,et al.  Improving Machine Translation Performance by Exploiting Non-Parallel Corpora , 2005, CL.

[9]  Alex Waibel,et al.  Adaptation of the translation model for statistical machine translation based on information retrieval , 2005, EAMT.

[10]  Dan Klein,et al.  Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[11]  William D. Lewis,et al.  Intelligent Selection of Language Model Training Data , 2010, ACL.

[12]  Barbara Plank,et al.  Effective Measures of Domain Similarity for Parsing , 2011, ACL.

[13]  Qun Liu,et al.  Improving Statistical Machine Translation Performance by Training Data Selection and Optimization , 2007, EMNLP-CoNLL.

[14]  Jianfeng Gao,et al.  Domain Adaptation via Pseudo In-Domain Data Selection , 2011, EMNLP.

[15]  Yan Song,et al.  Entropy-based Training Data Selection for Domain Adaptation , 2012, COLING.