Tracking Typological Traits of Uralic Languages in Distributed Language Representations

Although linguistic typology has a long history, computational approaches have only recently gained popularity. The use of distributed representations in computational linguistics has also become increasingly popular. A recent development is to learn distributed representations of language, such that typologically similar languages are spatially close to one another. Although empirical successes have been shown for such language representations, they have not been subjected to much typological probing. In this paper, we first look at whether this type of language representations are empirically useful for model transfer between Uralic languages in deep neural networks. We then investigate which typological features are encoded in these representations by attempting to predict features in the World Atlas of Language Structures, at various stages of fine-tuning of the representations. We focus on Uralic languages, and find that some typological traits can be automatically inferred with accuracies well above a strong baseline.

[1]  Noam Chomsky Lectures on Government and Binding: The Pisa Lectures , 1993 .

[2]  Sampo Pyysalo,et al.  Universal Dependencies v1: A Multilingual Treebank Collection , 2016, LREC.

[3]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[4]  Graham Neubig,et al.  Learning Language Representations for Typology Prediction , 2017, EMNLP.

[5]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[6]  Kevin Duh,et al.  DyNet: The Dynamic Neural Network Toolkit , 2017, ArXiv.

[7]  Johannes Bjerva,et al.  Morphological Complexity Influences Verb-Object Order in Swedish Sign Language , 2016, CL4LC@COLING 2016.

[8]  Hinrich Schütze,et al.  Past, Present, Future: A Computational Investigation of the Typology of Tense in 1000 Languages , 2017, EMNLP.

[9]  Viveka Velupillai,et al.  An Introduction to Linguistic Typology , 2012 .

[10]  Noam Chomsky,et al.  The Minimalist Program , 1992 .

[11]  Bernhard Wälchli Algorithmic typology and going from known to similar unknown categories within and across languages , 2014 .

[12]  Barbara Plank,et al.  Multilingual Part-of-Speech Tagging with Bidirectional Long Short-Term Memory Models and Auxiliary Loss , 2016, ACL.

[13]  Veronika Laippala,et al.  Universal Dependencies for Finnish , 2015, NODALIDA.

[14]  Simon J. Greenhill,et al.  Evolved structure of language shows lineage-specific trends in word-order universals , 2011, Nature.

[15]  Guillaume Lample,et al.  Polyglot Neural Language Models: A Case Study in Cross-Lingual Phonetic Representation Learning , 2016, NAACL.

[16]  Francis M. Tyers,et al.  Annotation schemes in North Sámi dependency parsing , 2017 .

[17]  Robert Östling,et al.  Word Order Typology through Multilingual Word Alignment , 2015, ACL.

[18]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[19]  Kevin Knight,et al.  Grapheme-to-Phoneme Models for (Almost) Any Language , 2016, ACL.

[20]  Joseph H. Greenberg,et al.  Language typology : a historical and analytic overview , 1974 .

[21]  Josef van Genabith,et al.  Massively Multilingual Neural Grapheme-to-Phoneme Conversion , 2017, ArXiv.

[22]  Jörg Tiedemann,et al.  Continuous multilinguality with language vectors , 2016, EACL.

[23]  W. Bruce Croft Typology and Universals , 1990 .

[24]  Noah A. Smith,et al.  Many Languages, One Parser , 2016, TACL.

[25]  Howard Lasnik The Theory of Principles and Parameters , 2014 .

[26]  Martin Wattenberg,et al.  Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation , 2016, TACL.

[27]  James McElvenny,et al.  Die Sprachwissenschaft: Ihre Aufgaben, Methoden Und Bisherigen Ergebnisse , 2015 .

[28]  Kadri Muischnek,et al.  Estonian Dependency Treebank: from Constraint Grammar tagset to Universal Dependencies , 2016, LREC.

[29]  Ryan Cotterell,et al.  Probabilistic Typology: Deep Generative Models of Vowel Inventories , 2017, ACL.

[30]  Martin Haspelmath,et al.  Language typology and language universals : an international handbook , 2001 .

[31]  Bernard Comrie,et al.  Language Universals and Linguistic Typology: Syntax and Morphology , 1981 .

[32]  Östen Dahl,et al.  Tense and aspect systems , 1985 .

[33]  János Csirik,et al.  Hungarian Dependency Treebank , 2010, LREC.

[34]  Johann Gottfried Herder,et al.  Abhandlung über den Ursprung der Sprache , 1827 .

[35]  J. Greenberg A Quantitative Approach to the Morphological Typology of Language , 1960, International Journal of American Linguistics.