incom.py - A Toolbox for Calculating Linguistic Distances and Asymmetries between Related Languages

Languages may be differently distant from each other and their mutual intelligibility may be asymmetric. In this paper we introduce incom.py, a toolbox for calculating linguistic distances and asymmetries between related languages. incom.py allows linguist experts to quickly and easily perform statistical analyses and compare those with experimental results. We demonstrate the efficacy of incom.py in an incomprehension experiment on two Slavic languages: Bulgarian and Russian. Using incom.py we were able to validate three methods to measure linguistic distances and asymmetries: Levenshtein distance, word adaptation surprisal, and conditional entropy as predictors of success in a reading intercomprehension experiment.

[1]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[2]  J. Vanhove The early learning of interlingual correspondence rules in receptive multilingualism , 2016 .

[3]  Noora Vidgren Cross-linguistic similarity in foreign language learning , 2011 .

[4]  Charlotte Gooskens The Contribution of Linguistic Factors to the Intelligibility of Closely Related Languages , 2007 .

[5]  Levenshtein Distance Levenshtein Distance anD WorD aDaptation surprisaL as MethoDs of Measuring MutuaL inteLLigibiLity in reaDing coMprehension of sLavic Languages , 2017 .

[6]  Peter Dirix,et al.  Conditional Entropy Measures Intelligibility among Related Languages , 2007 .

[7]  Annekatrin Kaivapalu,et al.  Perceived similarity between written Estonian and Finnish: Strings of letters or morphological units? , 2017, Nordic Journal of Linguistics.

[8]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[9]  W. Heeringa,et al.  Evaluation of String Distance Algorithms for Dialectology , 2006 .

[10]  Christus,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 2022 .

[11]  Charlotte Gooskens,et al.  Crosslinguistic influence and crosslinguistic interaction in multilingual language learning , 2015 .

[12]  W. Heeringa,et al.  Predicting intelligibility and perceived linguistic distance by means of the Levenshtein algorithm , 2008 .

[13]  Cristina Burani,et al.  Morpheme-based reading aloud: Evidence from dyslexic and skilled Italian readers , 2008, Cognition.

[14]  Dietrich Klakow,et al.  Modeling the impact of orthographic coding on Czech–Polish and Bulgarian–Russian reading intercomprehension , 2017, Nordic Journal of Linguistics.

[15]  Raphael Berthele,et al.  Data, R code and questionnaires for "Item-related determinants of cognate guessing in multilinguals" , 2015 .