Comparing morphological complexity of Spanish, Otomi and Nahuatl

We use two small parallel corpora for comparing the morphological complexity of Spanish, Otomi and Nahuatl. These are languages that belong to different linguistic families, the latter are low-resourced. We take into account two quantitative criteria, on one hand the distribution of types over tokens in a corpus, on the other, perplexity and entropy as indicators of word structure predictability. We show that a language can be complex in terms of how many different morphological word forms can produce, however, it may be less complex in terms of predictability of its internal structure of words.

[1]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[2]  Patricia Cabredo Hofherr,et al.  The structure of allomorphy in Spanish verbal inflection , 2006 .

[3]  Matthew Baerman,et al.  Dimensions of Morphological Complexity , 2012 .

[4]  Christian Bentz,et al.  A Comparison Between Morphological Complexity Measures: Typological Data vs. Language Corpora , 2016, CL4LC@COLING 2016.

[5]  Geoffrey Sampson,et al.  Language complexity as an evolving variable , 2009 .

[6]  Max Bane,et al.  Quantifying and Measuring Morphological Complexity , 2007 .

[7]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[8]  Gustav Herdan,et al.  The advanced theory of language as choice and chance , 1968 .

[9]  E. Palancar A typology of tone and inflection , 2016 .

[10]  Olivier Bonami,et al.  Stem spaces and predictability in verbal inflection , 2013 .

[11]  M. Baerman Paradigmatic Chaos in Nuer , 2012 .

[12]  Efstathios Stamatatos,et al.  A survey of modern authorship attribution methods , 2009, J. Assoc. Inf. Sci. Technol..

[13]  Daniel Jurafsky,et al.  A Computational Analysis of Style, Affect, and Imagery in Contemporary Poetry , 2012, CLfL@NAACL-HLT.

[14]  Joan L. Bybee,et al.  Language, Usage and Cognition , 2010 .

[15]  David Malvern,et al.  Investigating accommodation in language proficiency interviews using a new measure of lexical diversity , 2002 .

[16]  Matti Miestamo,et al.  Grammatical complexity in cross-linguistic perspective , 2008 .

[17]  E. Palancar The conjugation classes of Tilapa Otomi: An approach from canonical typology , 2012 .

[18]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[19]  W. K. Parker,et al.  MORPHOLOGY , 1954, Computer Vision.

[20]  James P. Blevins,et al.  The information-theoretic turn , 2013 .

[21]  Matthew Baerman,et al.  Understanding and measuring morphological complexity , 2015 .

[22]  Mikko Kurimo,et al.  Morfessor 2.0: Python Implementation and Extensions for Morfessor Baseline , 2013 .

[23]  Emmerich Kelih,et al.  The type-token relationship in Slavic parallel texts , 2010, Glottometrics.

[24]  Robert Malouf,et al.  Morphological Organization: The Low Conditional Entropy Conjecture , 2013 .

[25]  Gerardo Sierra,et al.  Axolotl: a Web Accessible Parallel Corpus for Spanish-Nahuatl , 2016, LREC.

[26]  Christian Rohrdantz,et al.  From the extraction of continuous features in parallel texts to visual analytics of heterogeneous areal-typological datasets , 2014 .

[27]  Claude E. Shannon,et al.  The Mathematical Theory of Communication. , 1951 .

[28]  Kimmo Kettunen,et al.  Can Type-Token Ratio be Used to Show Morphological Complexity of Languages?* , 2014, J. Quant. Linguistics.

[29]  E. Palancar Verbal Morphology and Prosody in Otomi1 , 2004, International Journal of American Linguistics.

[30]  David Mitchell,et al.  Type-token models: a comparative study , 2015, J. Quant. Linguistics.