论文信息 - Ontology Learning from Text: A Survey of Methods

Ontology Learning from Text: A Survey of Methods

After the vision of the Semantic Web was broadcasted at the turn of the millennium, ontology became a synonym for the solution to many problems concerning the fact that computers do not understand human language: if there were an ontology and every document were marked up with it and we had agents that would understand the markup, then computers would finally be able to process our queries in a really sophisticated way. Some years later, the success of Google shows us that the vision has not come true, being hampered by the incredible amount of extra work required for the intellectual encoding of semantic mark-up – as compared to simply uploading an HTML page. To alleviate this acquisition bottleneck, the field of ontology learning has since emerged as an important sub-field of ontology engineering. It is widely accepted that ontologies can facilitate text understanding and automatic processing of textual resources. Moving from words to concepts not only mitigates data sparseness issues, but also promises appealing solutions to polysemy and homonymy by finding non-ambiguous concepts that may map to various realizations in – possibly ambiguous – words. Numerous applications using lexical-semantic databases like WordNet (Miller, 1990) and its non-English counterparts, e.g. EuroWordNet (Vossen, 1997) or CoreNet (Choi and Bae, 2004) demonstrate the utility of semantic resources for natural language processing. Learning semantic resources from text instead of manually creating them might be dangerous in terms of correctness, but has undeniable advantages: Creating resources for text processing from the texts to be processed will fit the semantic component neatly and directly to them, which will never be possible with general-purpose resources. Further, the cost per entry is greatly reduced, giving rise to much larger resources than an advocate of a manual approach could ever afford. On the other hand, none of the methods used today are good enough for creating semantic resources of any kind in a completely unsupervised fashion, albeit automatic methods can facilitate manual construction to a large extent. The term ontology is understood in a variety of ways and has been used in philosophy for many centuries. In contrast, the notion of ontology in the field of computer science is younger – but almost used as inconsistently, when it comes to the details of the definition. The intention of this essay is to give an overview of different methods that learn ontologies or ontology-like structures from unstructured text. Ontology learning from other sources, issues in description languages, ontology editors, ontology merging and ontology evolving transcend the scope of this article. Surveys on ontology learning from text and other sources can be found in Ding and Foo (2002) and Gomez-Perez

Christian Biemann | Chris Biemann

[1] Gerhard Paass,et al. Learning Prototype Ontologies by Hierachical Latent Semantic Analysis , 2004, LWA.

[2] F. Dornseiff,et al. Der deutsche Wortschatz nach Sachgruppen , 2020 .

[3] Sergey Brin,et al. Extracting Patterns and Relations from the World Wide Web , 1998, WebDB.

[4] Carlo Strapparava,et al. Domain Kernels for Word Sense Disambiguation , 2005, ACL.

[5] Douglas B. Lenat,et al. CYC: a large-scale investment in knowledge infrastructure , 1995, CACM.

[6] Sharon A. Caraballo. Automatic construction of a hypernym-labeled noun hierarchy from text , 1999, ACL.

[7] Adam Kilgarriff,et al. SENSEVAL: an exercise in evaluating world sense disambiguation programs , 1998, LREC.

[8] David Sánchez,et al. Web-scale taxonomy learning , 2005 .

[9] George A. Miller,et al. Introduction to WordNet: An On-line Lexical Database , 1990 .

[10] George W. Davidson,et al. Roget's Thesaurus of English Words and Phrases , 1982 .

[11] Christian Biemann,et al. Semiautomatic Extension of CoreNet using a Bootstrapping Mechanism on Corpus-based Co-occurrences , 2004, COLING.