Quantifying Word Order Freedom in Dependency Corpora

Using recently available dependency corpora, we present novel measures of a key quantitative property of language, word order freedom: the extent to which word order in a sentence is free to vary while conveying the same meaning. We discuss two topics. First, we discuss linguistic and statistical issues associated with our measures and with the annotation styles of available corpora. We find that we can measure reliable upper bounds on word order freedom in head direction and the ordering of certain sisters, but that more general measures of word order freedom are not currently feasible. Second, we present results of our measures in 34 languages and demonstrate a correlation between quantitative word order freedom of subjects and objects and the presence of nominative-accusative case marking. To our knowledge this is the first large-scale quantitative test of the hypothesis that languages with more word order freedom have more case marking (Sapir, 1921; Kiparsky, 1997).

[1]  Joseph H. Greenberg,et al.  Some Universals of Grammar with Particular Reference to the Order of Meaningful Elements , 1990, On Language.

[2]  Haitao Liu,et al.  Dependency direction as a means of word-order typology: A method based on dependency treebanks , 2010 .

[3]  M. Dryer The Greenbergian word order correlations , 1992 .

[4]  George Kingsley Zipf,et al.  Human behavior and the principle of least effort , 1949 .

[5]  Joakim Nivre,et al.  Universal Stanford dependencies: A cross-linguistic typology , 2014, LREC.

[6]  Sara Klingenstein,et al.  Bootstrap Methods for the Empirical Study of Decision-Making and Information Flows in Social Systems , 2013, Entropy.

[7]  Rudolf Rosa,et al.  HamleDT 2.0: Thirty Dependency Treebanks Stanfordized , 2014, LREC.

[8]  R. Saxe,et al.  A Noisy-Channel Account of Crosslinguistic Word-Order Variation , 2013, Psychological science.

[9]  Veronika Laippala,et al.  Universal Dependencies 1.4 , 2015 .

[10]  Joakim Nivre,et al.  Towards a Universal Grammar for Natural Language Processing , 2015, CICLing.

[11]  Marco Kuhlmann,et al.  Mildly Non-Projective Dependency Grammar , 2013, CL.

[12]  Victor S Ferreira,et al.  Given-New Ordering Effects on the Production of Scrambled Sentences in Japanese , 2003, Journal of psycholinguistic research.

[13]  Judith Aissen,et al.  Differential Object Marking: Iconicity vs. Economy , 2003 .

[14]  Edward Sapir,et al.  Language: An Introduction to the Study of Speech , 1955 .

[15]  Franklin Chang,et al.  Learning to order words: A connectionist model of heavy NP shift and accessibility effects in Japanese and English , 2009 .

[16]  N. Ohashi,et al.  Agreement , 2002 .

[17]  Richard Futrell,et al.  Cross-linguistic gestures reflect typological universals: A subject-initial, verb-final bias in speakers of diverse languages , 2015, Cognition.

[18]  Daniel Zeman,et al.  HamleDT: To Parse or Not to Parse? , 2012, LREC.

[19]  Thomas McFadden,et al.  On morphological case and word-order freedom , 2003 .

[20]  Paul Kiparsky,et al.  The Rise of Positional Licensing , 1997 .

[21]  J. Nichols Head-marking and dependent-marking grammar , 1986 .

[22]  Haitao Liu,et al.  Language clusters based on linguistic complex networks , 2010 .

[23]  Dan Klein,et al.  Corpus-Based Induction of Syntactic Structure: Models of Dependency and Constituency , 2004, ACL.

[24]  Alexander Mehler,et al.  Automatic Language Classification by means of Syntactic Dependency Networks , 2011, J. Quant. Linguistics.

[25]  Ga Miller,et al.  Note on the bias of information estimates , 1955 .

[26]  Steven Abney,et al.  The English Noun Phrase in its Sentential Aspect , 1972 .