论文信息 - A Web Survey on the Use of Active Learning to Support Annotation of Text Data

A Web Survey on the Use of Active Learning to Support Annotation of Text Data

As supervised machine learning methods for addressing tasks in natural language processing (NLP) prove increasingly viable, the focus of attention is naturally shifted towards the creation of training data. The manual annotation of corpora is a tedious and time consuming process. To obtain high-quality annotated data constitutes a bottleneck in machine learning for NLP today. Active learning is one way of easing the burden of annotation. This paper presents a first probe into the NLP research community concerning the nature of the annotation projects undertaken in general, and the use of active learning as annotation support in particular.

Fredrik Olsson | Katrin Tomanek | Katrin Tomanek | Fredrik Olsson | K. Tomanek

[1] Eric K. Ringger,et al. Active Learning for Part-of-Speech Tagging: Accelerating Corpus Annotation , 2007, LAW@ACL.

[2] Miles Osborne,et al. A Two-Stage Method for Active Learning of Statistical Grammars , 2005, IJCAI.

[3] Udo Hahn,et al. An Approach to Text Corpus Construction which Cuts Annotation Costs and Maintains Reusability of Annotated Data , 2007, EMNLP.

[4] Jian Su,et al. Multi-Criteria-based Active Learning for Named Entity Recognition , 2004, ACL.

[5] Shlomo Argamon,et al. Committee-Based Sampling For Training Probabilistic Classi(cid:12)ers , 1995 .

[6] William A. Gale,et al. A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[7] Burr Settles,et al. Active Learning Literature Survey , 2009 .