论文信息 - In-tool Learning for Selective Manual Annotation in Large Corpora

In-tool Learning for Selective Manual Annotation in Large Corpora

We present a novel approach to the selective annotation of large corpora through the use of machine learning. Linguistic search engines used to locate potential instances of an infrequent phenomenon do not support ranking the search results. This favors the use of high-precision queries that return only a few results over broader queries that have a higher recall. Our approach introduces a classifier used to rank the search results and thus helping the annotator focus on those results with the highest potential of being an instance of the phenomenon in question, even in low-precision queries. The classifier is trained in an in-tool fashion, except for preprocessing relying only on the manual annotations done by the users in the querying tool itself. To implement this approach, we build upon CSniper1, a web-based multi-user search and annotation tool.

Iryna Gurevych | Erik-Lân Do Dinh | Richard Eckart de Castilho | Iryna Gurevych | E. Dinh

[1] Iryna Gurevych,et al. A broad-coverage collection of portable NLP components for building shareable analysis pipelines , 2014, OIAF4HLT@COLING.

[2] Beatrice Alex,et al. Investigating the Effects of Selective Sampling on the Annotation Task , 2005 .

[3] Thomas S. Morton,et al. WordFreak: An Open Tool for Linguistic Annotation , 2003, HLT-NAACL.

[4] Stefan Evert,et al. Twenty-first century Corpus Workbench: Updating a query architecture for the new millennium , 2011 .

[5] Thorsten Joachims,et al. Making large-scale support vector machine learning practical , 1999 .

[6] Dan Klein,et al. Accurate Unlexicalized Parsing , 2003, ACL.

[7] Iryna Gurevych,et al. CSNIPER - Annotation-by-query for Non-canonical Constructions in Large Corpora , 2012, ACL.

[8] Iryna Gurevych,et al. Automatic Annotation Suggestions and Custom Annotation Layers in WebAnno , 2014, ACL.

[9] Burr Settles,et al. Closing the Loop: Fast, Interactive Semi-Supervised Annotation With Queries on Features and Instances , 2011, EMNLP.

[10] Alessandro Moschitti,et al. Making Tree Kernels Practical for Natural Language Learning , 2006, EACL.

[11] H. Hughes. The Cambridge Grammar of the English Language , 2003 .