Predicting Learner Levels for Online Exercises of Hebrew

We develop a system for predicting the level of language learners, using only a small amount of targeted language data. In particular, we focus on learners of Hebrew and predict level based on restricted placement exam exercises. As with many language teaching situations, a major problem is data sparsity, which we account for in our feature selection, learning algorithm, and in the setup. Specifically, we define a two-phase classification process, isolating individual errors and linguistic constructions which are then aggregated into a second phase; such a two-step process allows for easy integration of other exercises and features in the future. The aggregation of information also allows us to smooth over sparse features.

[1]  Markus Dickinson,et al.  Developing online ICALL exercises for Russian , 2008 .

[2]  Adriane Boyd,et al.  EAGLE: an Error-Annotated Corpus of Beginning Learner German , 2010, LREC.

[3]  Dan Roth,et al.  Algorithm Selection and Model Adaptation for ESL Correction Tasks , 2011, ACL.

[4]  G. Fulcher An English language placement test: issues in reliability and validity , 1997 .

[5]  Jill Burstein,et al.  AUTOMATED ESSAY SCORING WITH E‐RATER® V.2.0 , 2004 .

[6]  Yoav Goldberg,et al.  Joint Hebrew Segmentation and Parsing using a PCFGLA Lattice Parser , 2011, ACL.

[7]  Khalil Sima'an,et al.  Building a tree-bank of modern hebrew text , 2001 .

[8]  Martin Chodorow,et al.  The Ups and Downs of Preposition Error Detection in ESL Writing , 2008, COLING.

[9]  Adam Kilgarriff,et al.  Helping Our Own: The HOO 2011 Pilot Shared Task , 2011, ENLG.

[10]  Alon Itai,et al.  Language resources for Hebrew , 2008, Lang. Resour. Evaluation.

[11]  Luna Filipović,et al.  Criterial Features in L2 English: Specifying the Reference Levels of the Common European Framework , 2012 .

[12]  Walter Daelemans,et al.  TiMBL: Tilburg Memory-Based Learner , 2007 .

[13]  Walter Daelemans,et al.  Forgetting Exceptions is Harmful in Language Learning , 1998, Machine Learning.

[14]  Helen Yannakoudakis,et al.  A New Dataset and Method for Automatically Grading ESOL Texts , 2011, ACL.

[15]  Paula Buttery,et al.  Criterial Features in Learner Corpora: Theory and Illustrations , 2010 .

[16]  Markus Dickinson On Morphological Analysis for Learner Language, Focusing on Russian , 2010 .

[17]  Walter Daelemans,et al.  TiMBL: Tilburg Memory-Based Learner, version 2.0, Reference guide , 1998 .

[18]  Markus Dickinson,et al.  Developing Methodology for Korean Particle Error Detection , 2011, BEA@ACL.

[19]  Michele Banko,et al.  Mitigating the Paucity-of-Data Problem: Exploring the Effect of Training Corpus Size on Classifier Performance for Natural Language Processing , 2001, HLT.

[20]  João Cordeiro,et al.  Paraphrase Alignment for Synonym Evidence Discovery , 2010, COLING.

[21]  Ana Díaz-Negrillo,et al.  ERROR TAGGING SYSTEMS FOR LEARNER CORPORA , 2006 .

[22]  Shuly Wintner,et al.  A Finite-State Morphological Grammar of Hebrew , 2005, Natural Language Engineering.