论文信息 - Sequential Supervised Learning for Hypernym Discovery from Wikipedia

Sequential Supervised Learning for Hypernym Discovery from Wikipedia

Hypernym discovery is an essential task for building and extending ontologies automatically. In comparison to the whole Web as a source for information extraction, online encyclopedias provide far more structuredness and reliability. In this paper we propose a novel approach that combines syntactic and lexical-semantic information to identify hypernymic relationships. We compiled semi-automatically and manually created training data and a gold standard for evaluation with the first sentences from the German version of Wikipedia. We trained a sequential supervised learner with a semantically enhanced tagset. The experiments showed that the cleanliness of the data is far more important than the amount of the same. Furthermore, it was shown that bootstrapping is a viable approach to ameliorate the results. Our approach outperformed the competitive lexico-syntactic patterns by 7% leading to an F 1-measure of over .91.

[1] Ebroul Izquierdo,et al. Combining image captions and visual analysis for image concept classification , 2008, MDM '08.

[2] Robert Porzel,et al. Resolution of Lexical Ambiguities in Spoken Dialogue System , 2004, SIGDIAL Workshop.

[3] Steven P. Abney,et al. Bootstrapping , 2002, ACL.

[4] Christopher D. Manning,et al. Enriching the Knowledge Sources Used in a Maximum Entropy Part-of-Speech Tagger , 2000, EMNLP.

[5] Christer Samuelsson,et al. Morphological Tagging Based Entirely on Bayesian Inference , 1993, NODALIDA.

[6] Christian Biemann,et al. Supporting Web-based Address Extraction with Unsupervised Tagging , 2007, GfKl.

[7] Sharon A. Caraballo. Automatic construction of a hypernym-labeled noun hierarchy from text , 1999, ACL.

[8] Dan Tufis,et al. Tagging romanian texts: a case study for QTAG, a language independent probabilistic tagger , 1998 .

[9] Daniel Jurafsky,et al. Semantic Taxonomy Induction from Heterogenous Evidence , 2006, ACL.

[10] Thorsten Brants,et al. TnT – A Statistical Part-of-Speech Tagger , 2000, ANLP.

[11] Marti A. Hearst. Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[12] Kentaro Torisawa,et al. Exploiting Wikipedia as External Knowledge for Named Entity Recognition , 2007, EMNLP.

[13] Ellen Riloff,et al. Semantic Class Learning from the Web with Hyponym Pattern Linkage Graphs , 2008, ACL.

[14] Andrew McCallum,et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.