INCEpTION: Interactive machine-assisted annotation

The demand for high-quality annotated text corpora in science and industry has sky-rocketed over the past years. To address this need, we introduce INCEpTION,1 a web-based platform for efficient text annotation. The platform generically supports use cases that require span or relation annotations as well as entity and fact linking. To the best of our knowledge, INCEpTION is the first text annotation platform that integrates interactive annotation support, knowledge management and offers extensibility. Annotation support. To minimize the required human effort and to increase annotation speed and quality, possible annotations are suggested by machine learning algorithms, so-called recommenders. When the user accepts, rejects or corrects these suggestions, this feedback is used to update the recommender model in the background. This interactive process creates a tight feedback loop between human and machine which continually provides better suggestions. A non-obtrusive active learning mode can be used to navigate the suggestions in the order of the largest estimated improvement in recommendation quality. State-of-the art generic annotation tools only support static pre-annotations or manual intervention to trigger re-training, making INCEpTION the first generic tool to offer interactive annotation support. Knowledge management. INCEpTION supports entity and fact linking with knowledge bases (KB), a common requirement for tasks like cross-document text discovery/exploration or knowledge base population/completion. Recommendation support is also implemented for entity linking: for existing named entity annotations, suitable KB entries to link with are suggested. When annotating new named entities, auto-completion with contextual re-ranking facilitates finding the right entity, even for very large knowledge bases like Wikidata. While some existing applications also support entity and fact linking, editing knowledge bases often can only be performed in a different tool. Integrating knowledge management in INCEpTION enables performing KB population and KB completion tasks during the annotation process. Extensibility. By using an open and extensible architecture, INCEpTION aims to induce a shift from implementing new annotation tools for new corpus projects towards adding project-specific functionality through customization. INCEpTION provides many extension points where custom Java code can easily be called through events or dependency injection. Additionally, web-service APIs are provided, e.g. in order to support external annotation web services as custom recommenders. The latter allows users to reuse already trained machine learning models, even in different programming