Towards an Active Learning System for Company Name Disambiguation in Microblog Streams

In this paper we describe the collaborative participation of UvA & UNED at RepLab 2013. We propose an active learning approach for the filtering subtask, using features based on the detected semantics in the tweet (using Entity Linking with Wikipedia), as well as tweet-inherent features such as hashtags and usernames. The tweets manually inspected during the active learning process is at most 1% of the test data. While our baseline does not perform well, we can see that active learning does improve the results.