Unsupervised multilingual learning

For centuries, scholars have explored the deep links among human languages. In this thesis, we present a class of probabilistic models that exploit these links as a form of naturally occurring supervision. These models allow us to substantially improve performance for core text processing tasks, such as morphological segmentation, part-of-speech tagging, and syntactic parsing. Besides these traditional NLP tasks, we also present a multilingual model for lost language decipherment. We test this model on the ancient Ugaritic language. Our results show that we can automatically uncover much of the historical relationship between Ugaritic and Biblical Hebrew, a known related language. (Copies available exclusively from MIT Libraries, Rm. 14-0551, Cambridge, MA 02139-4307. Ph. 617-253-5668; Fax 617-253-1690.)

[1]  Jason Eisner,et al.  Learning Non-Isomorphic Tree Mappings for Machine Translation , 2003, ACL.

[2]  Grzegorz Kondrak,et al.  Identifying Cognates by Phonetic and Semantic Similarity , 2001, NAACL.

[3]  Jacques B. M. Guy An Algorithm for Identifying Cognates in Bilingual Wordlists and its Applicability to Machine Translation , 1994, J. Quant. Linguistics.

[4]  Chris Brew,et al.  A Cross-language Approach to Rapid Creation of New Morpho-syntactically Annotated Resources , 2006, LREC.

[5]  Alon Itai,et al.  Two Languages Are More Informative Than One , 1991, ACL.

[6]  Noah A. Smith,et al.  Contrastive Estimation: Training Log-Linear Models on Unlabeled Data , 2005, ACL.

[7]  Mark Johnson,et al.  Using Universal Linguistic Knowledge to Guide Grammar Induction , 2010, EMNLP.

[8]  Dan Klein,et al.  Learning Bilingual Lexicons from Monolingual Corpora , 2008, ACL.

[9]  John B. Lowe,et al.  The Reconstruction Engine: A Computer Implementation of the Comparative Method , 1994, CL.

[10]  Steven Bird,et al.  The Human Language Project: Building a Universal Corpus of the World's Languages , 2010, ACL.

[11]  Sebastian Riedel,et al.  The CoNLL 2007 Shared Task on Dependency Parsing , 2007, EMNLP.

[12]  Dekai Wu,et al.  Machine Translation with a Stochastic Grammatical Channel , 1998, COLING-ACL.

[13]  Wilfred G. E. Watson,et al.  Handbook of Ugaritic Studies , 2000 .

[14]  Geoffrey E. Hinton Products of experts , 1999 .

[15]  John K Kruschke,et al.  Bayesian data analysis. , 2010, Wiley interdisciplinary reviews. Cognitive science.

[16]  Kevin Knight,et al.  Unsupervised Analysis for Decipherment Problems , 2006, ACL.

[17]  Regina Barzilay,et al.  Multilingual Part-of-Speech Tagging: Two Unsupervised Approaches , 2009, J. Artif. Intell. Res..

[18]  Regina Barzilay,et al.  Unsupervised Multilingual Grammar Induction , 2009, ACL.

[19]  Kuo-Chung Tai,et al.  The Tree-to-Tree Correction Problem , 1979, JACM.

[20]  Philip Resnik,et al.  Bootstrapping parsers via syntactic projection across parallel texts , 2005, Natural Language Engineering.

[21]  Mirella Lapata,et al.  Machine Translation by Triangulation: Making Effective Use of Multi-Parallel Corpora , 2007, ACL.

[22]  Marcello Federico,et al.  Phrase-based statistical machine translation with pivot languages. , 2008, IWSLT.

[23]  Dmitriy Genzel Inducing a Multilingual Dictionary from a Parallel Multitext in Related Languages , 2005, HLT/EMNLP.

[24]  Michele Banko,et al.  Part-of-Speech Tagging in Context , 2004, COLING.

[25]  Dekai Wu,et al.  Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora , 1997, CL.

[26]  Lyle Campbell,et al.  Historical Linguistics: An Introduction , 1991 .

[27]  Tomaz Erjavec,et al.  MULTEXT-East Version 3: Multilingual Morphosyntactic Specifications, Lexicons and Corpora , 2004, LREC.

[28]  Sabine Buchholz,et al.  CoNLL-X Shared Task on Multilingual Dependency Parsing , 2006, CoNLL.

[29]  Hermann Ney,et al.  Statistical multi-source translation , 2001, MTSUMMIT.

[30]  Dekai Wu,et al.  Stochastic Inversion Transduction Grammars, with Application to Segmentation, Bracketing, and Alignment of Parallel Corpora , 1995, IJCAI.

[31]  David Yarowsky,et al.  Multipath Translation Lexicon Induction via Bridge Languages , 2001, NAACL.

[32]  Mirella Lapata,et al.  Cross-linguistic Projection of Role-Semantic Information , 2005, HLT/EMNLP.

[33]  Chris Brew,et al.  A Resource-light Approach to Russian Morphology: Tagging Russian using Czech resources , 2004, EMNLP.

[34]  Mirella Lapata,et al.  Optimal Constituent Alignment with Edge Covers for Semantic Projection , 2006, ACL.

[35]  Emily M. Bender Linguistically Naïve != Language Independent: Why NLP Needs Linguistic Typology , 2009 .

[36]  Regina Barzilay,et al.  Adding More Languages Improves Unsupervised Multilingual Part-of-Speech Tagging: a Bayesian Non-Parametric Approach , 2009, NAACL.

[37]  Chong Wang,et al.  Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process , 2009, NIPS.

[38]  David Yarowsky,et al.  Minimally Supervised Morphological Analysis by Multimodal Alignment , 2000, ACL.

[39]  William M. Schniedewind,et al.  A Primer on Ugaritic: Language, Culture and Literature , 2007 .

[40]  Pascale Fung,et al.  Finding Terminology Translations from Non-parallel Corpora , 1997, VLC.

[41]  Thomas L. Griffiths,et al.  A fully Bayesian approach to unsupervised part-of-speech tagging , 2007, ACL.

[42]  Tao Jiang,et al.  Alignment of Trees - An Alternative to Tree Edit , 1994, Theor. Comput. Sci..

[43]  Peter N. Yianilos,et al.  Learning String-Edit Distance , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[44]  Dan Klein,et al.  Learning Better Monolingual Models with Unannotated Bilingual Text , 2010, CoNLL.

[45]  Christopher D. Manning,et al.  The unsupervised learning of natural language structure , 2005 .

[46]  Dan Klein,et al.  Prototype-Driven Learning for Sequence Models , 2006, NAACL.

[47]  Bernard Mérialdo,et al.  Tagging English Text with a Probabilistic Model , 1994, CL.

[48]  Jun S. Liu,et al.  The Collapsed Gibbs Sampler in Bayesian Computations with Applications to a Gene Regulation Problem , 1994 .

[49]  Noah A. Smith,et al.  Bilingual Parsing with Factored Estimation: Using English to Parse Korean , 2004, EMNLP.

[50]  Hwee Tou Ng,et al.  Exploiting Parallel Texts for Word Sense Disambiguation: An Empirical Study , 2003, ACL.

[51]  C. Antoniak Mixtures of Dirichlet Processes with Applications to Bayesian Nonparametric Problems , 1974 .

[52]  Regina Barzilay,et al.  Unsupervised Multilingual Learning for Morphological Segmentation , 2008, ACL.

[53]  Kaizhong Zhang,et al.  On the Editing Distance Between Unordered Labeled Trees , 1992, Inf. Process. Lett..

[54]  Michael Collins,et al.  Head-Driven Statistical Models for Natural Language Parsing , 2003, CL.

[55]  Thomas L. Griffiths,et al.  Bayesian Inference for PCFGs via Markov Chain Monte Carlo , 2007, NAACL.

[56]  M. Escobar,et al.  Bayesian Density Estimation and Inference Using Mixtures , 1995 .

[57]  Mark S. A. Smith,et al.  Untold Stories: The Bible and Ugaritic Studies in the Twentieth Century , 2001 .

[58]  Grzegorz Kondrak,et al.  Multilingual Cognate Identification using Integer Linear Programming , 2022 .

[59]  Mark Johnson,et al.  A Bayesian LDA-based model for semi-supervised part-of-speech tagging , 2007, NIPS.

[60]  J. S. Rao,et al.  Spike and slab variable selection: Frequentist and Bayesian strategies , 2005, math/0505633.

[61]  Glenn Carroll,et al.  Two Experiments on Learning Probabilistic Dependency Grammars from Corpora , 1992 .

[62]  Regina Barzilay,et al.  Cross-lingual Propagation for Morphological Analysis , 2008, AAAI.

[63]  Kenji Yamada,et al.  A Computational Approach to Deciphering Unknown Scripts , 1999 .

[64]  Peter Green,et al.  Markov chain Monte Carlo in Practice , 1996 .

[65]  J. Sethuraman A CONSTRUCTIVE DEFINITION OF DIRICHLET PRIORS , 1991 .

[66]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[67]  Dan Klein,et al.  Joint Parsing and Alignment with Weakly Synchronized Grammars , 2010, NAACL.

[68]  Andrew Radford,et al.  Linguistics: An Introduction , 1999 .

[69]  John E. Hopcroft,et al.  An n log n algorithm for minimizing states in a finite automaton , 1971 .

[70]  Steve Young,et al.  Applications of stochastic context-free grammars using the Inside-Outside algorithm , 1990 .

[71]  Mark Johnson,et al.  Why Doesn’t EM Find Good HMM POS-Taggers? , 2007, EMNLP.

[72]  Phil Blunsom,et al.  Bayesian Synchronous Grammar Induction , 2008, NIPS.

[73]  T. Ferguson A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[74]  David Yarowsky,et al.  Inducing Multilingual Text Analysis Tools via Robust Projection across Aligned Corpora , 2001, HLT.

[75]  Dan Klein,et al.  A Probabilistic Approach to Diachronic Phonology , 2007, EMNLP-CoNLL.

[76]  Regina Barzilay,et al.  Unsupervised Multilingual Learning for POS Tagging , 2008, EMNLP.

[77]  J. Baker Trainable grammars for speech recognition , 1979 .

[78]  Joshua Goodman,et al.  Parsing Inside-Out , 1998, ArXiv.

[79]  Robert L. Mercer,et al.  Word-Sense Disambiguation Using Statistical Methods , 1991, ACL.

[80]  H. G. May The Revised Standard Version Bible , 1974 .

[81]  David Chiang,et al.  A Hierarchical Phrase-Based Model for Statistical Machine Translation , 2005, ACL.

[82]  Rebecca Hwa,et al.  A Backoff Model for Bootstrapping Resources for Non-English Languages , 2005, HLT/EMNLP.

[83]  Dan Klein,et al.  Two Languages are Better than One (for Syntactic Parsing) , 2008, EMNLP.

[84]  Zoubin Ghahramani,et al.  MCMC for Doubly-intractable Distributions , 2006, UAI.

[85]  David Yarowsky,et al.  Inducing Multilingual POS Taggers and NP Bracketers via Robust Projection Across Aligned Corpora , 2001, NAACL.

[86]  Andreas Eisele,et al.  Improving Statistical Machine Translation Efficiency by Triangulation , 2008, LREC.

[87]  Noah A. Smith,et al.  Shared Logistic Normal Distributions for Soft Parameter Tying in Unsupervised Grammar Induction , 2009, NAACL.

[88]  Richard Johansson,et al.  The CoNLL-2009 Shared Task: Syntactic and Semantic Dependencies in Multiple Languages , 2009, CoNLL Shared Task.

[89]  Vladimir Solmon,et al.  The estimation of stochastic context-free grammars using the Inside-Outside algorithm , 2003 .

[90]  Daniel Gildea,et al.  Stochastic Lexicalized Inversion Transduction Grammar for Alignment , 2005, ACL.

[91]  Reinhard Rapp,et al.  Automatic Identification of Word Translations from Unrelated English and German Corpora , 1999, ACL.

[92]  Christian P. Robert,et al.  The Bayesian choice : from decision-theoretic foundations to computational implementation , 2007 .

[93]  Andreas Stolcke,et al.  Inducing Probabilistic Grammars by Bayesian Model Merging , 1994, ICGI.

[94]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[95]  Hang Li,et al.  Word Translation Disambiguation Using Bilingual Bootstrapping , 2004, Computational Linguistics.

[96]  A. Robinson Lost Languages: The Enigma of the World's Undeciphered Scripts , 2002 .

[97]  Bernard Comrie,et al.  Language Universals and Linguistic Typology: Syntax and Morphology , 1981 .

[98]  Regina Barzilay,et al.  A Statistical Model for Lost Language Decipherment , 2010, ACL.

[99]  Joakim Nivre,et al.  Integrating Graph-Based and Transition-Based Dependency Parsers , 2008, ACL.

[100]  Rada Mihalcea,et al.  Unsupervised natural language disambiguation using non-ambiguous words , 2003, RANLP.

[101]  Ben Taskar,et al.  Better Alignments = Better Translations? , 2008, ACL.

[102]  Dan Klein,et al.  Finding Cognate Groups Using Phylogenies , 2010, ACL.

[103]  Dan Klein,et al.  A Generative Constituent-Context Model for Improved Grammar Induction , 2002, ACL.

[104]  Robert Hetzron,et al.  Semitic Languages , 1954, PMLA/Publications of the Modern Language Association of America.

[105]  Philipp Koehn,et al.  Learning a Translation Lexicon from Monolingual Corpora , 2002, ACL 2002.

[106]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[107]  Eric Brill,et al.  Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging , 1995, CL.

[108]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[109]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[110]  A. Rollett,et al.  The Monte Carlo Method , 2004 .

[111]  Gerald Penn,et al.  Quantitative Methods for Classifying Writing Systems , 2006, HLT-NAACL.

[112]  Hitoshi Isahara,et al.  A Comparison of Pivot Methods for Phrase-Based Statistical Machine Translation , 2007, NAACL.

[113]  Leonid Kogan,et al.  A Dictionary of the Ugaritic Language in the Alphabetic Tradition , 2006 .

[114]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[115]  Richard Sproat,et al.  Writing Systems, Transliteration and Decipherment , 2009, HLT-NAACL.

[116]  Yoav Seginer,et al.  Fast Unsupervised Incremental Parsing , 2007, ACL.

[117]  Jonas Kuhn Experiments in parallel-text based grammar induction , 2004, ACL.

[118]  I. Dan Melamed,et al.  Multitext Grammars and Synchronous Parsers , 2003, NAACL.

[119]  Yoshua Bengio,et al.  Unsupervised Sense Disambiguation Using Bilingual Probabilistic Models , 2004, ACL.

[120]  Martha Palmer,et al.  Penn Korean Treebank : Development and Evaluation , 2002, PACLIC.

[121]  Philip Resnik,et al.  A Perspective on Word Sense Disambiguation Methods and Their Evaluation , 2002 .

[122]  Christopher D. Manning,et al.  Enriching the Knowledge Sources Used in a Maximum Entropy Part-of-Speech Tagger , 2000, EMNLP.

[123]  Dan Klein,et al.  Phylogenetic Grammar Induction , 2010, ACL.

[124]  Philip Resnik,et al.  An Unsupervised Method for Word Sense Tagging using Parallel Corpora , 2002, ACL.