An exploratory study of L1-specific non-words

In this paper, we explore L1-specific non-words, i.e. non-words in a target language (in this case Swedish) that are re-ranked by a different-language language model. We surmise that speakers of a certain L1 will react different to L1-specific non-words than to general non-words. We present the results from two small case studies exploring whether re-ranking non-words with different language models leads to a perceived difference in `Swedishness' (pilot study 1) and whether German and English native speakers have longer reaction times in a lexical decision task when presented with their respective L1-specific non-words (pilot study 2). Tentative results seem to indicate that L1-specific non-words are processed second-slowest, after purely Swedish-looking non-words.

[1]  Markus Forsberg,et al.  SALDO: a touch of yin to WordNet’s yang , 2013, Lang. Resour. Evaluation.

[2]  D. LaBerge,et al.  Basic processes in reading : perception and comprehension , 2017 .

[3]  Marc Brysbaert,et al.  Wuggy: A multilingual pseudoword generator , 2010, Behavior research methods.

[4]  M Coltheart,et al.  DRC: a dual route cascaded model of visual word recognition and reading aloud. , 2001, Psychological review.

[5]  David Alfter,et al.  Towards Single Word Lexical Complexity Prediction , 2018, BEA@NAACL-HLT.

[6]  Thomas Eckart,et al.  Building Large Monolingual Dictionaries at the Leipzig Corpora Collection: From 100 to 200 Languages , 2012, LREC.

[7]  Marc Brysbaert,et al.  WordGen: A tool for word selection and nonword generation in Dutch, English, German, and French , 2004, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[8]  F. Saussure,et al.  Course in General Linguistics , 1960 .

[9]  D. Besner,et al.  Reading pseudohomophones: Implications for models of pronunciation assembly and the locus of word-frequency effects in naming. , 1987 .

[10]  Michael S. C. Thomas,et al.  Language Switching Costs in Bilingual Visual Word Recognition , 2000 .

[11]  Torsten Zesch,et al.  The Automatic Generation of Nonwords for Lexical Recognition Tests , 2015, LTC.

[12]  Amar Balla,et al.  Tashkeela: Novel corpus of Arabic vocalized texts, data for auto-diacritization systems , 2017, Data in brief.

[13]  K. Forster,et al.  Lexical Access and Naming Time. , 1973 .

[14]  Wiebke Wagner,et al.  Steven Bird, Ewan Klein and Edward Loper: Natural Language Processing with Python, Analyzing Text with the Natural Language Toolkit , 2010, Lang. Resour. Evaluation.

[15]  Kristin Lemhöfer,et al.  Introducing LexTALE: A quick and valid Lexical Test for Advanced Learners of English , 2011, Behavior research methods.

[16]  Max Coltheart,et al.  Modeling Reading: The Dual‐Route Approach , 2008 .

[17]  M. Coltheart,et al.  358,534 nonwords: The ARC Nonword Database , 2002, The Quarterly journal of experimental psychology. A, Human experimental psychology.