Multilingual Part-of-Speech Tagging: Two Unsupervised Approaches

We demonstrate the effectiveness of multilingual learning for unsupervised part-of-speech tagging. The central assumption of our work is that by combining cues from multiple languages, the structure of each becomes more apparent. We consider two ways of applying this intuition to the problem of unsupervised part-of-speech tagging: a model that directly merges tag structures for a pair of languages into a single sequence and a second model which instead incorporates multilingual context using latent variables. Both approaches are formulated as hierarchical Bayesian models, using Markov Chain Monte Carlo sampling techniques for inference. Our results demonstrate that by incorporating multilingual evidence we can achieve impressive performance gains across a range of scenarios. We also found that performance improves steadily as the number of available languages increases.

[1]  Mark Johnson,et al.  Why Doesn’t EM Find Good HMM POS-Taggers? , 2007, EMNLP.

[2]  Jerry Nedelman,et al.  Book review: “Bayesian Data Analysis,” Second Edition by A. Gelman, J.B. Carlin, H.S. Stern, and D.B. Rubin Chapman & Hall/CRC, 2004 , 2005, Comput. Stat..

[3]  J. Baker Trainable grammars for speech recognition , 1979 .

[4]  Mirella Lapata,et al.  Machine Translation by Triangulation: Making Effective Use of Multi-Parallel Corpora , 2007, ACL.

[5]  Jun S. Liu,et al.  The Collapsed Gibbs Sampler in Bayesian Computations with Applications to a Gene Regulation Problem , 1994 .

[6]  T. Ferguson A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[7]  Rada Mihalcea,et al.  Unsupervised natural language disambiguation using non-ambiguous words , 2003, RANLP.

[8]  Robert L. Mercer,et al.  Word-Sense Disambiguation Using Statistical Methods , 1991, ACL.

[9]  Noah A. Smith,et al.  Contrastive Estimation: Training Log-Linear Models on Unlabeled Data , 2005, ACL.

[10]  Dekai Wu,et al.  Machine Translation with a Stochastic Grammatical Channel , 1998, COLING-ACL.

[11]  John K Kruschke,et al.  Bayesian data analysis. , 2010, Wiley interdisciplinary reviews. Cognitive science.

[12]  Dan Klein,et al.  Prototype-Driven Learning for Sequence Models , 2006, NAACL.

[13]  Bernard Mérialdo,et al.  Tagging English Text with a Probabilistic Model , 1994, CL.

[14]  Hwee Tou Ng,et al.  Exploiting Parallel Texts for Word Sense Disambiguation: An Empirical Study , 2003, ACL.

[15]  David Yarowsky,et al.  Inducing Multilingual Text Analysis Tools via Robust Projection across Aligned Corpora , 2001, HLT.

[16]  Peter Green,et al.  Markov chain Monte Carlo in Practice , 1996 .

[17]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Philip Resnik,et al.  A Perspective on Word Sense Disambiguation Methods and Their Evaluation , 2002 .

[19]  Max Welling Donald,et al.  Products of Experts , 2007 .

[20]  David Chiang,et al.  A Hierarchical Phrase-Based Model for Statistical Machine Translation , 2005, ACL.

[21]  Rebecca Hwa,et al.  A Backoff Model for Bootstrapping Resources for Non-English Languages , 2005, HLT/EMNLP.

[22]  BrillEric,et al.  Transformation-based error-driven learning and natural language processing , 1995 .

[23]  Andreas Eisele,et al.  Improving Statistical Machine Translation Efficiency by Triangulation , 2008, LREC.

[24]  Regina Barzilay,et al.  Unsupervised Multilingual Learning for Morphological Segmentation , 2008, ACL.

[25]  Mirella Lapata,et al.  Optimal Constituent Alignment with Edge Covers for Semantic Projection , 2006, ACL.

[26]  Mark Johnson,et al.  A Bayesian LDA-based model for semi-supervised part-of-speech tagging , 2007, NIPS.

[27]  Marcello Federico,et al.  Phrase-based statistical machine translation with pivot languages. , 2008, IWSLT.

[28]  Hermann Ney,et al.  Statistical multi-source translation , 2001, MTSUMMIT.

[29]  Dekai Wu,et al.  Stochastic Inversion Transduction Grammars, with Application to Segmentation, Bracketing, and Alignment of Parallel Corpora , 1995, IJCAI.

[30]  Regina Barzilay,et al.  Adding More Languages Improves Unsupervised Multilingual Part-of-Speech Tagging: a Bayesian Non-Parametric Approach , 2009, NAACL.

[31]  Jerome L. Myers,et al.  Research Design and Statistical Analysis , 1991 .

[32]  Regina Barzilay,et al.  Unsupervised Multilingual Learning for POS Tagging , 2008, EMNLP.

[33]  Thomas L. Griffiths,et al.  A fully Bayesian approach to unsupervised part-of-speech tagging , 2007, ACL.

[34]  M. Escobar,et al.  Bayesian Density Estimation and Inference Using Mixtures , 1995 .

[35]  Dmitriy Genzel Inducing a Multilingual Dictionary from a Parallel Multitext in Related Languages , 2005, HLT/EMNLP.

[36]  Michele Banko,et al.  Part-of-Speech Tagging in Context , 2004, COLING.

[37]  Tomaz Erjavec,et al.  MULTEXT-East Version 3: Multilingual Morphosyntactic Specifications, Lexicons and Corpora , 2004, LREC.

[38]  Philip Resnik,et al.  An Unsupervised Method for Word Sense Tagging using Parallel Corpora , 2002, ACL.

[39]  Alon Itai,et al.  Two Languages Are More Informative Than One , 1991, ACL.

[40]  Hang Li,et al.  Word Translation Disambiguation Using Bilingual Bootstrapping , 2004, Computational Linguistics.

[41]  Bernard Comrie,et al.  Language Universals and Linguistic Typology: Syntax and Morphology , 1981 .

[42]  Eric Brill,et al.  Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging , 1995, CL.

[43]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[44]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[45]  J. Sethuraman A CONSTRUCTIVE DEFINITION OF DIRICHLET PRIORS , 1991 .

[46]  Benjamin Snyder,et al.  Unsupervised multilingual learning , 2010 .

[47]  Jonas Kuhn Experiments in parallel-text based grammar induction , 2004, ACL.

[48]  Yoshua Bengio,et al.  Unsupervised Sense Disambiguation Using Bilingual Probabilistic Models , 2004, ACL.

[49]  Hitoshi Isahara,et al.  A Comparison of Pivot Methods for Phrase-Based Statistical Machine Translation , 2007, NAACL.