Classifying Syntactic Regularities for Hundreds of Languages

This paper presents a comparison of classification methods for linguistic typology for the purpose of expanding an extensive, but sparse language resource: the World Atlas of Language Structures (WALS) (Dryer and Haspelmath, 2013). We experimented with a variety of regression and nearest-neighbor methods for use in classification over a set of 325 languages and six syntactic rules drawn from WALS. To classify each rule, we consider the typological features of the other five rules; linguistic features extracted from a word-aligned Bible in each language; and genealogical features (genus and family) of each language. In general, we find that propagating the majority label among all languages of the same genus achieves the best accuracy in label pre- diction. Following this, a logistic regression model that combines typological and linguistic features offers the next best performance. Interestingly, this model actually outperforms the majority labels among all languages of the same family.

[1]  John DeNero,et al.  Tailoring Word Alignments to Syntactic Machine Translation , 2007, ACL.

[2]  Taraka Rama,et al.  How Good are Typological Distances for Determining Genealogical Relationships among Languages? , 2012, COLING.

[3]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[4]  Regina Barzilay,et al.  Selective Sharing for Multilingual Dependency Parsing , 2012, ACL.

[5]  P Karatsareas Syntactic Structures of the World’s Languages – Greek (Cappadocian) , 2015 .

[6]  Fei Xia,et al.  Comparing Language Similarity across Genetic and Typologically-Based Groupings , 2010, COLING.

[7]  Emily M. Bender,et al.  Towards Creating Precision Grammars from Interlinear Glossed Text: Inferring Large-Scale Typological Properties , 2013, LaTeCH@ACL.

[8]  Bernd Bohnet,et al.  Very high accuracy and fast dependency parsing is not a contradiction , 2010, COLING 2010.

[9]  Ben Taskar,et al.  Alignment by Agreement , 2006, NAACL.

[10]  John A. Hawkins,et al.  Word order universals , 1983 .

[11]  Dragomir R. Radev,et al.  A Random Walk–Based Model for Identifying Semantic Orientation , 2014, Computational Linguistics.

[12]  Boris Katz,et al.  Reconstructing Native Language Typology from Foreign Language Usage , 2014, CoNLL.

[13]  Yee Whye Teh,et al.  Bayesian Agglomerative Clustering with Coalescents , 2007, NIPS.

[14]  Michael Cysouw,et al.  Quantitative explorations of the world-wide distribution of rare characteristics, or: the exceptionality of northwestern European languages , 2010 .

[15]  Hal Daumé,et al.  A Bayesian Model for Discovering Typological Implications , 2007, ACL.

[16]  Fei Xia,et al.  Multilingual Structural Projection across Interlinear Text , 2007, HLT-NAACL.

[17]  Slav Petrov,et al.  Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections , 2011, ACL.

[18]  Philip Resnik,et al.  Bootstrapping parsers via syntactic projection across parallel texts , 2005, Natural Language Engineering.

[19]  Quoc V. Le,et al.  Exploiting Similarities among Languages for Machine Translation , 2013, ArXiv.

[20]  Chris Kayne Richard Collins,et al.  Syntactic Structures of the World's Languages (SSWL) , 2009 .

[21]  Søren Wichmann,et al.  A stability metric for typological features , 2008 .

[22]  S. Levinson,et al.  Structural Phylogenetics and the Reconstruction of Ancient Language History , 2005, Science.

[23]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[24]  Fei Xia,et al.  Automatically Identifying Computationally Relevant Typological Features , 2008, IJCNLP.

[25]  Barbara Pfeffer The Cradle Of Language Studies In The Evolution Of Language , 2016 .

[26]  Michael Cysouw,et al.  How varied typologically are the languages of Africa , 2009 .

[27]  L. R. Moscovice Max Planck Institute for Evolutionary Anthropology, Department of Primatology , 2017 .

[28]  Alexis Palmer,et al.  Visualising Typological Relationships: Plotting WALS with Heat Maps , 2012, EACL 2012.

[29]  Joseph H. Greenberg,et al.  Some Universals of Grammar with Particular Reference to the Order of Meaningful Elements , 1990, On Language.

[30]  Adam Lopez,et al.  Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies , 2011 .