Harvesting Relations from the Web - Quantifiying the Impact of Filtering Functions

Several bootstrapping-based relation extraction algorithms working on large corpora or on the Web have been presented in the literature. A crucial issue for such algorithms is to avoid the introduction of too much noise into further iterations. Typically, this is achieved by applying appropriate pattern and tuple evaluation measures, henceforth called filtering functions, thereby selecting only the most promising patterns and tuples. In this paper, we systematically compare different filtering functions proposed across the literature. Although we also discuss our own implementation of a pattern learning algorithm, the main contribution of the paper is actually the extensive comparison and evaluation of the different filtering functions proposed in the literature with respect to seven datasets. Our results indicate that some of the commonly used filters do not outperform a trivial baseline filter in a statistically significant manner.