Uncovering Probabilistic Implications in Typological Knowledge Bases

The study of linguistic typology is rooted in the implications we find between linguistic features, such as the fact that languages with object-verb word ordering tend to have post-positions. Uncovering such implications typically amounts to time-consuming manual processing by trained and experienced linguists, which potentially leaves key linguistic universals unexplored. In this paper, we present a computational model which successfully identifies known universals, including Greenberg universals, but also uncovers new ones, worthy of further linguistic investigation. Our approach outperforms baselines previously used for this problem, as well as a strong baseline from knowledge base population.

[1]  Jörg Tiedemann,et al.  Continuous multilinguality with language vectors , 2016, EACL.

[2]  Judea Pearl,et al.  Reverend Bayes on Inference Engines: A Distributed Hierarchical Approach , 1982, AAAI.

[3]  Hinrich Schütze,et al.  Past, Present, Future: A Computational Investigation of the Typology of Tense in 1000 Languages , 2017, EMNLP.

[4]  Thierry Poibeau,et al.  Modeling Language Variation and Universals: A Survey on Typological Linguistics for Natural Language Processing , 2018, Computational Linguistics.

[5]  Isabelle Augenstein,et al.  Parameter sharing between dependency parsers for related languages , 2018, EMNLP.

[6]  W. Stolz Universals of Language. , 1968 .

[7]  S. Potter,et al.  Universals of Language , 1966 .

[8]  Isabelle Augenstein,et al.  Tracking Typological Traits of Uralic Languages in Distributed Language Representations , 2017, Proceedings of the Fourth International Workshop on Computatinal Linguistics of Uralic Languages.

[9]  Robert Östling,et al.  Word Order Typology through Multilingual Word Alignment , 2015, ACL.

[10]  Ryan Cotterell,et al.  A Probabilistic Generative Model of Linguistic Typology , 2019, NAACL.

[11]  Hal Daumé,et al.  A Bayesian Model for Discovering Typological Implications , 2007, ACL.

[12]  Fei Xia,et al.  Comparing Language Similarity across Genetic and Typologically-Based Groupings , 2010, COLING.

[13]  Ni Lao,et al.  Relational retrieval using a combination of path-constrained random walks , 2010, Machine Learning.

[14]  Jason Eisner,et al.  The Galactic Dependencies Treebanks: Getting More Data by Synthesizing New Languages , 2016, TACL.

[15]  W. Bruce Croft Typology and Universals , 1990 .

[16]  Graham Neubig,et al.  Learning Language Representations for Typology Prediction , 2017, EMNLP.

[17]  Jörg Tiedemann,et al.  What Do Language Representations Really Represent? , 2019, Computational Linguistics.

[18]  Noam Chomsky Lectures on Government and Binding: The Pisa Lectures , 1993 .

[19]  Isabelle Augenstein,et al.  From Phonology to Syntax: Unsupervised Linguistic Typology at Different Levels with Language Embeddings , 2018, NAACL-HLT.

[20]  Ryan Cotterell,et al.  A Deep Generative Model of Vowel Formant Typology , 2018, NAACL.

[21]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[22]  Ryan Cotterell,et al.  Probabilistic Typology: Deep Generative Models of Vowel Inventories , 2017, ACL.

[23]  Lisa Beinborn,et al.  Semantic Drift in Multilingual Representations , 2019, Computational Linguistics.

[24]  Tom M. Mitchell,et al.  Random Walk Inference and Learning in A Large Scale Knowledge Base , 2011, EMNLP.