Cost-Effectiveness in Building a Low-Resource Morphological Analyzer for Learner Language

In this paper, we describe the development of a morphological analyzer for learner Hungarian, outlining extensions to a resourcelight system that can be developed by different types of experts. Specifically, we discuss linguistic rule writing, resource creation, and different system settings, and our evaluation showcases the amount of improvement one gets for differing levels and kinds of effort, enabling other researchers to spend their time and energy as effectively as possible.

[1]  Jason Baldridge,et al.  Learning a Part-of-Speech Tagger from Two Hours of Annotation , 2013, NAACL.

[2]  Camilla Schwind Error Analysis and Explanation in Knowledge Based Language Tutoring. , 1995 .

[3]  Veronika Vincze,et al.  magyarlanc: A Tool for Morphological and Dependency Parsing of Hungarian , 2013, RANLP.

[4]  Beáta Megyesi,et al.  Improving Brill’s POS Tagger for an Agglutinative Language , 1999, EMNLP.

[5]  Veit Reuer Error Recognition and Feedback with Lexical Functional Grammar , 2003 .

[6]  Anna Feldman,et al.  A Resource-Light Approach to Morpho-Syntactic Tagging , 2009 .

[7]  Markus Dickinson,et al.  Automatic morphological analysis of learner Hungarian , 2015, BEA@NAACL-HLT.

[8]  Pushpak Bhattacharyya,et al.  Hindi POS Tagger Using Naive Stemming : Harnessing Morphological Information Without Extensive Linguistic Knowledge , 2008 .

[9]  Balázs Kis,et al.  A Unification-based Approach to Morpho-syntactic Parsing of Agglutinative and Other (Highly) Inflectional Languages , 1999, ACL.

[10]  András Kornai,et al.  Hunmorph: Open Source Word Analysis , 2005, ACL 2005.

[11]  Markus Dickinson,et al.  Annotating Errors in a Hungarian Learner Corpus , 2012, LREC.

[12]  János Csirik,et al.  The Szeged Corpus: A POS Tagged and Syntactically Annotated Hungarian Natural Language Corpus , 2004, TSD.

[13]  Trevor Cohn,et al.  Low Resource Dependency Parsing: Cross-lingual Parameter Sharing in a Neural Network Parser , 2015, ACL.

[14]  Mans Hulden,et al.  Foma: a Finite-State Compiler and Library , 2009, EACL.

[15]  Anastassia Loukina,et al.  Feature selection for automated speech scoring , 2015, BEA@NAACL-HLT.

[16]  Jennifer Foster,et al.  Working with a small dataset - semi-supervised dependency parsing for Irish , 2013, SPMRL@EMNLP.

[17]  Pushpak Bhattacharyya,et al.  Morphological Richness Offsets Resource Demand - Experiences in Constructing a POS Tagger for Hindi , 2006, ACL.

[18]  Tamás Váradi,et al.  The Hungarian Gigaword Corpus , 2014, LREC.

[19]  Jason Baldridge,et al.  Real-World Semi-Supervised Learning of POS-Taggers for Low-Resource Languages , 2013, ACL.

[20]  Miklós Törkenczy Hungarian Verbs And Essentials of Grammar , 1997 .

[21]  Péter Rebrus,et al.  Morphdb.hu: Hungarian lexical database and morphological grammar , 2006, LREC.

[22]  Slav Petrov,et al.  Multi-Source Transfer of Delexicalized Dependency Parsers , 2011, EMNLP.