Reconstructing an Indo-European Family Tree from Non-native English Texts

Mother tongue interference is the phenomenon where linguistic systems of a mother tongue are transferred to another language. Although there has been plenty of work on mother tongue interference, very little is known about how strongly it is transferred to another language and about what relation there is across mother tongues. To address these questions, this paper explores and visualizes mother tongue interference preserved in English texts written by Indo-European language speakers. This paper further explores linguistic features that explain why certain relations are preserved in English writing, and which contribute to related tasks such as native language identification.

[1]  Luay Nakhleh,et al.  An experimental study comparing linguistic phylogenetic reconstruction methods , 2013 .

[2]  Hermann Ney,et al.  Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[3]  Stuart James,et al.  The Cambridge Encyclopedia of Language (3rd ed.) , 2011 .

[4]  Anil Kumar Singh,et al.  From Bag of Languages to Family Trees From Noisy Corpus , 2009, RANLP.

[5]  Sylviane Granger,et al.  The International Corpus of Learner English , 1993 .

[6]  Jessica Enright,et al.  The application of chordal graphs to inferring phylogenetic trees of languages , 2011, IJCNLP.

[7]  Alvar Ellegard Statistical Measurement of Linguistic Relationship , 1959 .

[8]  Mark Dras,et al.  Contrastive Analysis and Native Language Identification , 2009, ALTA.

[9]  Joel R. Tetreault,et al.  The utility of article and preposition error correction systems for English language learners: Feedback and assessment , 2010 .

[10]  Moshe Koppel,et al.  Determining an author's native language by mining a text for errors , 2005, KDD '05.

[11]  Bengt Altenberg,et al.  The use of adverbial connectors in advanced Swedish learners' written English , 1998 .

[12]  Hans van Halteren,et al.  Source Language Markers in EUROPARL Translations , 2008, COLING.

[13]  Kenji Kita Automatic Clustering of Languages Based on Probabilistic Models , 1999 .

[14]  W. Snyder The Acquisitional Role of the Syntax-Morphology Interface : Morphological Compounds and Syntactic Complex Predicates , 2003 .

[15]  Robert S.P. Beekes,et al.  Comparative Indo-European Linguistics: An introduction , 1995 .

[16]  R. Gray,et al.  Language-tree divergence times support the Anatolian theory of Indo-European origin , 2003, Nature.

[17]  Vladimir Batagelj,et al.  Automatic clustering of languages , 1992 .

[18]  David Crystal,et al.  The Cambridge Encyclopedia of Language , 2012, Modern Language Review.

[19]  Mark Dras,et al.  Exploiting Parse Structures for Native Language Identification , 2011, EMNLP.

[20]  L. R. Rabiner,et al.  A probabilistic distance measure for hidden Markov models , 1985, AT&T Technical Journal.

[21]  Philip Baldi,et al.  Indo-European Languages , 2015 .

[22]  Mark Dras,et al.  Exploring Adaptor Grammars for Native Language Identification , 2012, EMNLP.

[23]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[24]  Luay Nakhleh,et al.  A comparison of phylogenetic reconstruction methods on an Indo‐European dataset , 2005 .

[25]  Edward W. D. Whittaker,et al.  Creating a manually error-tagged and shallow-parsed learner corpus , 2011, ACL.

[26]  Robert Orr,et al.  Comparative Indo-European Linguistics: An Introduction. 2nd Ed , 2014 .

[27]  Sylviane Granger,et al.  Tag sequences in learner corpora: a key to interlanguage grammar and discourse , 1998 .

[28]  Moshe Koppel,et al.  Translationese and Its Dialects , 2011, ACL.

[29]  Masumi Narita,et al.  A Discriminant Analysis of Non-native Speakers and Native Speakers of English. , 2008 .

[30]  A. L. Kroeber,et al.  Quantitative Classification of Indo-European Languages , 1937 .

[31]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.