论文信息 - Harvesting Relations from the Web - Quantifiying the Impact of Filtering Functions

Harvesting Relations from the Web - Quantifiying the Impact of Filtering Functions

Several bootstrapping-based relation extraction algorithms working on large corpora or on the Web have been presented in the literature. A crucial issue for such algorithms is to avoid the introduction of too much noise into further iterations. Typically, this is achieved by applying appropriate pattern and tuple evaluation measures, henceforth called filtering functions, thereby selecting only the most promising patterns and tuples. In this paper, we systematically compare different filtering functions proposed across the literature. Although we also discuss our own implementation of a pattern learning algorithm, the main contribution of the paper is actually the extensive comparison and evaluation of the different filtering functions proposed in the literature with respect to seven datasets. Our results indicate that some of the commonly used filters do not outperform a trivial baseline filter in a statistically significant manner.

Philipp Cimiano | Egon Stemle | Sebastian Blohm

[1] Eduard H. Hovy,et al. Learning surface text patterns for a Question Answering System , 2002, ACL.

[2] Sergey Brin,et al. Extracting Patterns and Relations from the World Wide Web , 1998, WebDB.

[3] Steffen Staab,et al. Towards the self-annotating web , 2004, WWW '04.

[4] Patrick Pantel,et al. Espresso: Leveraging Generic Patterns for Automatically Harvesting Semantic Relations , 2006, ACL.

[5] Daniel Jurafsky,et al. Learning Syntactic Patterns for Automatic Hypernym Discovery , 2004, NIPS.

[6] Doug Downey,et al. Unsupervised named-entity extraction from the Web: An experimental study , 2005, Artif. Intell..

[7] Doug Downey,et al. Learning text patterns for web information extraction and assessment , 2004, AAAI 2004.

[8] Doug Downey,et al. A Probabilistic Model of Redundancy in Information Extraction , 2005, IJCAI.

[9] Luis Gravano,et al. Snowball: extracting relations from large plain-text collections , 2000, DL '00.

[10] Marti A. Hearst. Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.