Active Learning Selection Strategies for Information Extraction

The need for labeled documents is a key bottleneck in adaptive information extraction. One way to solve this problem is through active learning algorithms that require users to label only the most informative documents. We investigate several document selection strategies that are particularly relevant to information extraction. We show that some strategies are biased toward recall, while others are biased toward precision, but it is difficult to ensure both high recall and precision. We also show that there is plenty of scope for improved selection strategies, and investigate the relationship between the documents selected and the relative performance between two strategies.