Modeling the Annotation Process for Ancient Corpus Creation

In corpus creation human annotation is expensive. Annotation costs can be minimized through machine learning and active learning, however there are many complex interactions among the machine learner, the active learning technique, the annotation cost, human annotation accuracy, the annotator user interface, and several other elements of the process. For example, we show that changing the way in which annotators are paid can drastically change the performance of active learning techniques. To date these interactions have been poorly understood. We introduce a decision-theoretic model of the annotation process suitable for ancient corpus annotation that clarifies these interactions and can guide the development of a corpus creation project.