GDEX: Automatically Finding Good Dictionary Examples in a Corpus

Users appreciate examples. If a dictionary entry includes contextualized examples of the different senses a word may have, then the user generally gets what they want in a quick and straightforward way. Thus, there are grounds for including lots of examples and contexts. Producing good examples, however, can be labour-intensive, thus, expensive. We automatically found good candidate sentences in a corpus, with which lexicographers could work. The technology used to add examples to an online version of a leading dictionary: we describe and evaluate the project. We consider a range of other ways in which the finding of good examples can bridge the gap between corpuses, dictionaries, and language learning.

[1]  B. T. S. Atkins,et al.  The Oxford Guide to Practical Lexicography , 2008 .

[2]  Hitoshi Isahara,et al.  EFL Learner Reading Time Model for Evaluating Reading Proficiency , 2008, CICLing.

[3]  Thierry Fontenelle Practical Lexicography: A Reader , 2008 .

[4]  Adam Kilgarriff,et al.  The Sketch Engine , 2004 .

[5]  Kevyn Collins-Thompson,et al.  A Language Modeling Approach to Predicting Reading Difficulty , 2004, NAACL.

[6]  R. Flesch A new readability yardstick. , 1948, The Journal of applied psychology.

[7]  William H. DuBay The Principles of Readability. , 2004 .

[8]  Batia Laufer Corpus-based versus lexicographer examples in comprehension and production of new words , 1992 .

[9]  Mari Ostendorf,et al.  Reading Level Assessment Using Support Vector Machines and Statistical Language Models , 2005, ACL.

[10]  猫田 英伸,et al.  Common European Framework of Reference for Languagesの意義を考える : 日本の英語教育関係者の連携のために , 2002 .

[11]  Michael Rundell,et al.  Macmillan English Dictionary for Advanced Learners , 2002 .

[12]  Silvia Bernardini,et al.  BootCaT: Bootstrapping Corpora and Terms from the Web , 2004, LREC.

[13]  Oliver Mason,et al.  Language Independent Statistical Software for Corpus Exploration , 1997, Comput. Humanit..

[14]  Silvia Bernardini,et al.  Introducing and evaluating ukWaC , a very large web-derived corpus of English , 2008 .

[15]  Adam Kilgarriff,et al.  Collocationality (and how to measure it) , 2006 .

[16]  Adam Kilgarriff,et al.  Introduction to the Special Issue on the Web as Corpus , 2003, CL.