论文信息 - Unsupervised Multilingual Learning for POS Tagging - 字舞流文

Unsupervised Multilingual Learning for POS Tagging

We demonstrate the effectiveness of multilingual learning for unsupervised part-of-speech tagging. The key hypothesis of multilingual learning is that by combining cues from multiple languages, the structure of each becomes more apparent. We formulate a hierarchical Bayesian model for jointly predicting bilingual streams of part-of-speech tags. The model learns language-specific features while capturing cross-lingual patterns in tag distribution for aligned words. Once the parameters of our model have been learned on bilingual parallel data, we evaluate its performance on a held-out monolingual test set. Our evaluation on six pairs of languages shows consistent and significant performance gains over a state-of-the-art monolingual baseline. For one language pair, we observe a relative reduction in error of 53%.

Regina Barzilay | Jacob Eisenstein | Tahira Naseem | Benjamin Snyder | R. Barzilay | Benjamin Snyder | Jacob Eisenstein | Tahira Naseem

[1] Regina Barzilay,et al. Unsupervised Multilingual Learning for Morphological Segmentation , 2008, ACL.

[2] Thomas L. Griffiths,et al. A fully Bayesian approach to unsupervised part-of-speech tagging , 2007, ACL.

[3] Mark Johnson,et al. Why Doesn’t EM Find Good HMM POS-Taggers? , 2007, EMNLP.

[4] Philip Resnik,et al. An Unsupervised Method for Word Sense Tagging using Parallel Corpora , 2002, ACL.

[5] David Chiang,et al. A Hierarchical Phrase-Based Model for Statistical Machine Translation , 2005, ACL.

[6] Rebecca Hwa,et al. A Backoff Model for Bootstrapping Resources for Non-English Languages , 2005, HLT/EMNLP.

[7] W. K. Hastings,et al. Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[8] Dmitriy Genzel. Inducing a Multilingual Dictionary from a Parallel Multitext in Related Languages , 2005, HLT/EMNLP.

[9] Michele Banko,et al. Part-of-Speech Tagging in Context , 2004, COLING.

[10] David Yarowsky,et al. Inducing Multilingual POS Taggers and NP Bracketers via Robust Projection Across Aligned Corpora , 2001, NAACL.

[11] Donald Geman,et al. Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12] Robert L. Mercer,et al. Word-Sense Disambiguation Using Statistical Methods , 1991, ACL.

[13] Mirella Lapata,et al. Optimal Constituent Alignment with Edge Covers for Semantic Projection , 2006, ACL.

[14] Philip Resnik,et al. A Perspective on Word Sense Disambiguation Methods and Their Evaluation , 2002 .

[15] Hwee Tou Ng,et al. Exploiting Parallel Texts for Word Sense Disambiguation: An Empirical Study , 2003, ACL.

[16] Jun S. Liu,et al. The Collapsed Gibbs Sampler in Bayesian Computations with Applications to a Gene Regulation Problem , 1994 .

[17] Noah A. Smith,et al. Contrastive Estimation: Training Log-Linear Models on Unlabeled Data , 2005, ACL.

[18] Dekai Wu,et al. Machine Translation with a Stochastic Grammatical Channel , 1998, COLING-ACL.

[19] John K Kruschke,et al. Bayesian data analysis. , 2010, Wiley interdisciplinary reviews. Cognitive science.

[20] Chris Brew,et al. A Cross-language Approach to Rapid Creation of New Morpho-syntactically Annotated Resources , 2006, LREC.

[21] Alon Itai,et al. Two Languages Are More Informative Than One , 1991, ACL.

[22] David Yarowsky,et al. Inducing Multilingual Text Analysis Tools via Robust Projection across Aligned Corpora , 2001, HLT.

[23] Mark Johnson,et al. A Bayesian LDA-based model for semi-supervised part-of-speech tagging , 2007, NIPS.

[24] Dan Klein,et al. Prototype-Driven Learning for Sequence Models , 2006, NAACL.

[25] Bernard Mérialdo,et al. Tagging English Text with a Probabilistic Model , 1994, CL.