Investigating the Selection of Example Sentences for Unknown Target Words in ICALL Reading Texts for L2 German

This thesis considers possible criteria for the selection of example sentences for difficult or unknown words in reading texts for students of German as a Second Language (GSL). The examples are intended to be provided within the context of an Intelligent Computer-Aided Language Learning (ICALL) Vocabulary Learning System, where students can choose among several explanation options for difficult words. Some of these options (e.g. glosses) have received a good deal of attention in the ICALL/Second Language (L2) Acquisition literature; in contrast, literature on examples has been the near exclusive province of lexicographers. The selection of examples is explored from an educational, L2 teaching point of view: the thesis is intended as a first exploration of the question of what makes an example helpful to the L2 student from the perspective of L2 teachers. An important motivation for this work is that selecting examples from a dictionary or randomly from a corpus has several drawbacks: first, the number of available dictionary examples is limited; second, the examples fail to take into account the context in which the word was encountered; and third, the rationale and precise principles behind the selection of dictionary examples is usually less than clear. Central to this thesis is the hypothesis that a random selection of example sentences from a suitable corpus can be improved by a guided selection process that takes into account characteristics of helpful examples. This is investigated by an empirical study conducted with teachers of L2 German. The teacher data show that four dimensions are significant criteria amenable to analysis: (a) reduced syntactic complexity, (b) sentence similarity, provision of (c) significant co-occurrences and (d) semantically related words. Models based on these dimensions are developed using logistic regression analysis, and evaluated through two further empirical studies with teachers and students of L2 German. The results of the teacher evaluation are encouraging: for the teacher evaluation, they indicate that, for one of the models, the top-ranked selections perform on the same level as dictionary examples. In addition, the model provides a ranking of potential examples that roughly corresponds to that of experienced teachers of L2 German. The student evaluation confirms and notably improves on the teacher evaluation in that the best-performing model of the teacher evaluation significantly outperforms both random corpus selections and dictionary examples (when a penalty for missing entries is included).

[1]  David McLean,et al.  An Approach for Measuring Semantic Similarity between Words Using Multiple Information Sources , 2003, IEEE Trans. Knowl. Data Eng..

[2]  A. Arruarte,et al.  LSA learner sentence comprehension in agglutinative and non-agglutinative languages , 2006 .

[3]  下畑 光夫 Acquiring paraphrases from corpora and its application to machine translation , 2004 .

[4]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[5]  G. A. Miller,et al.  Finitary models of language users , 1963 .

[6]  Mirella Lapata,et al.  Automatic Evaluation of Text Coherence: Models and Representations , 2005, IJCAI.

[7]  Derrick Higgins Which Statistics Reflect Semantics? Rethinking Synonymy and Word Similarity , 2005 .

[8]  C. F. van Parreren,et al.  Contextual guessing: A trainable reader strategy , 1981 .

[9]  Philip Resnik,et al.  Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language , 1999, J. Artif. Intell. Res..

[10]  Hai Xu Treatment of Deictic Expressions in Example Sentences in English Learners' Dictionaries , 2005 .

[11]  Hilary Nesi,et al.  The Role of Illustrative Examples in Productive Dictionary Use , 2012 .

[12]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[13]  Peter Wiemer-Hastings,et al.  Adding syntactic information to LSA , 2000 .

[14]  H. Antor Strategien der Benutzerfreundlichkeit im modernen EFL-Wörterbuch , 1994 .

[15]  Curt Burgess,et al.  Producing high-dimensional semantic spaces from lexical co-occurrence , 1996 .

[16]  M. Warschauer Comparing Face-To-Face and Electronic Discussion in the Second Language Classroom , 2013, CALICO Journal.

[17]  Peter Wiemer-Hastings,et al.  All parts are not created equal: SIAM-LSA , 2004 .

[18]  Graeme Hirst,et al.  Evaluating WordNet-based Measures of Lexical Semantic Relatedness , 2006, CL.

[19]  Anthony Paul Cowie English Dictionaries for Foreign Learners: A History , 2000 .

[20]  Ladislav Zgusta,et al.  Zu einer Theorie des lexikographischen Beispiels Prolegomena to a Theory of the Lexicographic Example Pour une théorie de l'exemple lexicographique , 1989 .

[21]  John B. Goodenough,et al.  Contextual correlates of synonymy , 1965, CACM.

[22]  Hans C. Jessen,et al.  Applied Logistic Regression Analysis , 1996 .

[23]  Michael Grüninger,et al.  Introduction , 2002, CACM.

[24]  Denise Howel,et al.  Linear Statistical Models: An Applied Approach. , 1991 .

[25]  Philip Hubbard,et al.  High Frequency vocabulary and reading proficiency in ESL readers , 1993 .

[26]  J. Fodor,et al.  Some syntactic determinants of sentential complexity , 1967 .

[27]  Michael T. Ullman,et al.  The neural basis of lexicon and grammar in first and second language: the declarative/procedural model , 2001 .

[28]  Anja Lenz Untersuchungen zur Beispiel- und Beleglexikographie historischer Bedeutungswörterbücher unter besonderer Berücksichtigung der Neubearbeitung des Deutschen Wörterbuchs gegründet von Jacob und Wilhelm Grimm , 2001 .

[29]  Robert L. Goldstone Similarity, interactive activation, and mapping , 1994 .

[30]  R. R. K. Hartmann,et al.  Teaching and researching lexicography , 2001 .

[31]  Bob Rehder,et al.  Using latent semantic analysis to assess knowledge: Some technical considerations , 1998 .

[32]  Jan-Arjen Mondria THE EFFECTS OF INFERRING, VERIFYING, AND MEMORIZING ON THE RETENTION OF L2 WORD MEANINGS , 2003, Studies in Second Language Acquisition.

[33]  Henri Béjoint,et al.  The Foreign Student's Use of Monolingual English Dictionaries: A Study of Language Needs and Reference Skills , 1981 .

[34]  Barbara Di Eugenio,et al.  FLSA: Extending Latent Semantic Analysis with Features for Dialogue Act Classification , 2004, ACL.

[35]  John A. Hawkins,et al.  A Performance Theory of Order and Constituency , 1995 .

[36]  G. Miller,et al.  A Semantic Network of English Verbs , 1998 .

[37]  Scott A. McDonald A Context-based Model of Semantic Similarity , 1997 .

[38]  Gregory Grefenstette,et al.  Estimation of English and non-English Language Use on the WWW , 2000, RIAO.

[39]  Thomas R. Howe A Basic German Vocabulary , 1945 .

[40]  John A. Hawkins,et al.  Syntactic Weight Versus Information Structure in Word Order Variation , 1992 .

[41]  Alexander Budanitsky,et al.  Lexical Semantic Relatedness and Its Application in Natural Language Processing , 1999 .

[42]  Victor H. Yngve,et al.  A model and an hypothesis for language structure , 1960 .

[43]  Howard Jackson,et al.  Lexicography: An Introduction , 2002 .

[44]  Herbert Ernst Wiegand,et al.  Vocabulary Control in the Definitions and Examples of Monolingual Dictionaries Die Kontrolle des Wortschatzes in Definitionen und Beispielen Contrôle du vocabulaire utilisé dans les définitions et exemples , 1989 .

[45]  Jan-Arjen Mondria THE EFFECTS OF INFERRING, VERIFYING, AND MEMORIZING ON THE RETENTION OF L2 WORD MEANINGS , 2003, Studies in Second Language Acquisition.

[46]  Horst Kopleck,et al.  Collins German dictionary , 2008 .

[47]  Thomas K. Landauer,et al.  On the computational basis of learning and cognition: Arguments from LSA , 2002 .

[48]  Paul Bogaards,et al.  Dictionaries for Learners of English , 1996 .

[49]  Guy Cook 'The philosopher pulled the lower jaw of the hen'. Ludicrous invented sentences in language teaching , 2001 .

[50]  Ido Dagan,et al.  Similarity-based methods for word sense disambiguation , 1997 .

[51]  Antonella Sorace,et al.  Second Language Vocabulary Acquisition and Learning Strategies in ICALL Environments , 2002 .

[52]  Susan T. Dumais,et al.  The latent semantic analysis theory of knowledge , 1997 .

[53]  E. Gibson Linguistic complexity: locality of syntactic dependencies , 1998, Cognition.

[54]  Surendra Prasad,et al.  Automatic Evaluation of Students’ Answers using Syntactically Enhanced LSA , 2003, HLT-NAACL 2003.

[55]  Herbert Ernst Wiegand,et al.  L'exemple lexicographique dans le dictionnaire monolingue Das lexikographische Beispiel im allgemeinen einsprachigen Wörterbuch The Lexicographic Example in the General Monolingual Dictionary , 1989 .

[56]  Sebastian Padó,et al.  Extracting Semantic Information from Corpora Using Dependency Relations , 2002 .

[57]  G. Miller,et al.  Contextual correlates of semantic similarity , 1991 .

[58]  Zuhair Bandar,et al.  A Method for Measuring Sentence Similarity and iIts Application to Conversational Agents , 2004, FLAIRS.

[59]  Morton Botel,et al.  A Formula for Measuring Syntactic Complexity; A Directional Effort. , 1972 .

[60]  Arthur C. Graesser,et al.  The Right Stuff: Do You Need to Sanitize Your Corpus When Using Latent Semantic Analysis? , 2002 .

[61]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[62]  Gregory Grefenstette,et al.  Explorations in automatic thesaurus discovery , 1994 .

[63]  Ted Pedersen,et al.  Extended Gloss Overlaps as a Measure of Semantic Relatedness , 2003, IJCAI.

[64]  M. Paradis A Neurolinguistic Theory of Bilingualism , 2004 .

[65]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[66]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[67]  Peter D. Turney Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL , 2001, ECML.

[68]  Batia Laufer Corpus-based versus lexicographer examples in comprehension and production of new words , 1992 .

[69]  D. C. Howell Statistical Methods for Psychology , 1987 .

[70]  George Kingsley Zipf,et al.  Human behavior and the principle of least effort , 1949 .

[71]  Ludovic Beheydt Call and vocabulary acquisition in Dutch , 1990 .

[72]  F Alipanahi THE EFFECTS OF CONTEXTUAL RICHNESS ON THE GUESSABILITY AND THE RETENTION OF WORDS IN A FOREIGN LANGUAGE , 2006 .

[73]  C. Perfetti,et al.  Linguistic complexity and text comprehension : readability issues reconsidered , 1989 .

[74]  C Schouten Van Parreren,et al.  VOCABULARY LEARNING THROUGH READING: WHICH CONDITIONS SHOULD BE MET WHEN PRESENTING WORDS IN TEXTS? , 1989 .

[75]  Graeme Hirst,et al.  Non-Classical Lexical Semantic Relations , 2004, Proceedings of the HLT-NAACL Workshop on Computational Lexical Semantics - CLS '04.

[76]  Andy P. Field,et al.  Discovering Statistics Using SPSS , 2000 .

[77]  M. Yoshii L1 and L2 Glosses: Their Effects on Incidental Vocabulary Learning , 2006 .

[78]  Craig Chaudron,et al.  Vocabulary elaboration in teachers' speech to L2 learners , 1982, Studies in Second Language Acquisition.

[79]  J. Alderson Assessing Reading: Acknowledgements , 2000 .

[80]  Mirella Lapata,et al.  Constructing Semantic Space Models from Parsed Corpora , 2003, ACL.

[81]  Peter M. Wiemer-Hastings,et al.  How Latent is Latent Semantic Analysis? , 1999, IJCAI.

[82]  Peter W. Foltz,et al.  The Measurement of Textual Coherence with Latent Semantic Analysis. , 1998 .

[83]  Graeme Hirst,et al.  Semantic distance in WordNet: An experimental, application-oriented evaluation of five measures , 2004 .

[84]  Rod Ellis,et al.  The Study of Second Language Acquisition , 1994 .

[85]  K. W. Hunt Grammatical structures written at three grade levels , 1965 .

[86]  C. Felser,et al.  Grammatical processing in language learners , 2006, Applied Psycholinguistics.

[87]  Andrea Abel Das lexicographische Beispiel in der L2-lexikographie (am Beispiel eines L2-kontext-und Grundworschatz-wörterbuches): 1334 , 2000 .

[88]  A. Graesser,et al.  Improving an intelligent tutor ’ s comprehension of students with Latent Semantic Analysis ∗ , 1999 .

[89]  L. Zgusta Manual of Lexicography , 1971 .

[90]  Günter Kempcke,et al.  ORGANISATIONSPRINZIPIEN UND INFORMATIONSANGEBOTE IN EINEM LERNERWÖRTERBÜCH , 1992 .

[91]  Christiane Fellbaum,et al.  Lexical Chains as Representations of Context for the Detection and Correction of Malapropisms , 1998 .

[92]  Ekkehard Zöfgen Lernerwörterbücher in Theorie und Praxis : ein Beitrag zur Metalexikographie mit besonderer Berücksichtigung des Französischen , 1994 .

[93]  C. Chaudron,et al.  The Effects of Linguistic Simplification and Elaboration Modifications on L2 Comprehension , 1987 .

[94]  Fritz Hermanns Das lexikographische Beispiel : Ein Beitrag zu seiner Theorie , 1988 .

[95]  Franz Josef Hausmann,et al.  Examples in the Bilingual Dictionary Die Beispiele im zweisprachigen Wörterbuch Les exemples dans le dictionnaire bilingue , 1991 .

[96]  Michael E. Lesk,et al.  Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone , 1986, SIGDOC '86.

[97]  Gerhard Wahrig Wahrig Deutsches Wörterbuch , 1981 .

[98]  Barry K. Rosen,et al.  Syntactic Complexity , 1974, Inf. Control..

[99]  Frank Keller,et al.  Using the Web to Obtain Frequencies for Unseen Bigrams , 2003, CL.

[100]  Jason Eisner,et al.  Lexical Semantics , 2020, The Handbook of English Linguistics.

[101]  J. Aitchison Words in the Mind: An Introduction to the Mental Lexicon , 1987 .

[102]  Magnus Sahlgren,et al.  Vector-based semantic analysis: representing word meanings based on random labels , 2001 .

[103]  Marilyn D. Wang The role of syntactic complexity as a determiner of comprehensibility , 1970 .

[104]  Hans Uszkoreit,et al.  Word Order and Constituent Structure in German , 1987, CSLI Lecture Notes.

[105]  J. Fodor,et al.  The Psychology of Language: An Introduction to Psycholinguistics and Generative Grammar , 1976 .

[106]  Alison Black,et al.  ON-LINE CONSULTATION OF DEFINITIONS AND EXAMPLES : IMPLICATIONS FOR THE DESIGN OF INTERACTIVE DICTIONARIES , 1991 .

[107]  Peter Wiemer-Hastings,et al.  Rules for Syntax, Vectors for Semantics , 2001 .

[108]  Ludovic Beheydt,et al.  The semantization of vocabulary in foreign language learning , 1987 .

[109]  Stan Szpakowicz,et al.  Roget's thesaurus and semantic similarity , 2012, RANLP.

[110]  R. John Linear Statistical Models: An Applied Approach , 1986 .

[111]  Peter W. Foltz,et al.  An introduction to latent semantic analysis , 1998 .

[112]  S. Weisberg,et al.  Residuals and Influence in Regression , 1982 .

[113]  Iryna Gurevych,et al.  Computing Semantic Relatedness in German with Revised Information Content Metrics , 2005, OntoLex@IJCNLP.

[114]  Susana M. Sotillo Discourse Functions and Syntactic Complexity in Synchronous and Asynchronous Communication , 2000 .