论文信息 - Automatically Inducing Ontologies from Corpora

Automatically Inducing Ontologies from Corpora

The emergence of vast quantities of on-line information has raised the importance of methods for automatic cataloguing of information in a variety of domains, including electronic commerce and bioinformatics. Ontologies can play a critical role in such cataloguing. In this paper, we describe a system that automatically induces an ontology from any large on-line text collection in a specific domain. The ontology that is induced consists of domain concepts, related by kind-of and part-of links. To achieve domain-independence, we use a combination of relatively shallow methods along with any available repositories of applicable background knowledge. We describe our evaluation experiences using these methods, and provide examples of induced structures.

[1] Sharon A. Caraballo. Automatic construction of a hypernym-labeled noun hierarchy from text , 1999, ACL.

[2] Kaizhong Zhang,et al. On the Editing Distance between Undirected Acyclic Graphs and Related Problems , 1995, CPM.

[3] Satoshi Sekine,et al. Statistical Matching of Two Ontologies , 1999, SIGLEX Workshop On Standardizing Lexical Resources.

[4] Pedro M. Domingos,et al. Learning to map between ontologies on the semantic web , 2002, WWW '02.

[5] Marti A. Hearst. Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[6] Dan I. Moldovan,et al. Learning Semantic Constraints for the Automatic Discovery of Part-Whole Relations , 2003, NAACL.

[7] Tom M. Mitchell,et al. Learning to Extract Symbolic Knowledge from the World Wide Web , 1998, AAAI/IAAI.

[8] Fernando Gomez,et al. Inferring heuristic classification hierarchies from natural language input , 1993 .

[9] Gregory Grefenstette,et al. Explorations in automatic thesaurus discovery , 1994 .

[10] W. Bruce Croft,et al. Deriving concept hierarchies from text , 1999, SIGIR '99.

[11] George A. Miller,et al. WordNet: A Lexical Database for English , 1995, HLT.

[12] Mark Craven,et al. Constructing Biological Knowledge Bases by Extracting Information from Text Sources , 1999, ISMB.

[13] Arnold L. Rosenberg,et al. Finding topic words for hierarchical summarization , 2001, SIGIR '01.

[14] Ted Dunning,et al. Accurate Methods for the Statistics of Surprise and Coincidence , 1993, CL.

[15] Steven P. Abney. Partial parsing via finite-state cascades , 1996, Natural Language Engineering.

[16] Paul R. Cohen,et al. Does Prior Knowledge Facilitate the Development of Knowledge-based Systems? , 1999, AAAI/IAAI.