Automatically Inducing Ontologies from Corpora

The emergence of vast quantities of on-line information has raised the importance of methods for automatic cataloguing of information in a variety of domains, including electronic commerce and bioinformatics. Ontologies can play a critical role in such cataloguing. In this paper, we describe a system that automatically induces an ontology from any large on-line text collection in a specific domain. The ontology that is induced consists of domain concepts, related by kind-of and part-of links. To achieve domain-independence, we use a combination of relatively shallow methods along with any available repositories of applicable background knowledge. We describe our evaluation experiences using these methods, and provide examples of induced structures.

[1]  Sharon A. Caraballo Automatic construction of a hypernym-labeled noun hierarchy from text , 1999, ACL.

[2]  Kaizhong Zhang,et al.  On the Editing Distance between Undirected Acyclic Graphs and Related Problems , 1995, CPM.

[3]  Satoshi Sekine,et al.  Statistical Matching of Two Ontologies , 1999, SIGLEX Workshop On Standardizing Lexical Resources.

[4]  Pedro M. Domingos,et al.  Learning to map between ontologies on the semantic web , 2002, WWW '02.

[5]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[6]  Dan I. Moldovan,et al.  Learning Semantic Constraints for the Automatic Discovery of Part-Whole Relations , 2003, NAACL.

[7]  Tom M. Mitchell,et al.  Learning to Extract Symbolic Knowledge from the World Wide Web , 1998, AAAI/IAAI.

[8]  Fernando Gomez,et al.  Inferring heuristic classification hierarchies from natural language input , 1993 .

[9]  Gregory Grefenstette,et al.  Explorations in automatic thesaurus discovery , 1994 .

[10]  W. Bruce Croft,et al.  Deriving concept hierarchies from text , 1999, SIGIR '99.

[11]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[12]  Mark Craven,et al.  Constructing Biological Knowledge Bases by Extracting Information from Text Sources , 1999, ISMB.

[13]  Arnold L. Rosenberg,et al.  Finding topic words for hierarchical summarization , 2001, SIGIR '01.

[14]  Ted Dunning,et al.  Accurate Methods for the Statistics of Surprise and Coincidence , 1993, CL.

[15]  Steven P. Abney Partial parsing via finite-state cascades , 1996, Natural Language Engineering.

[16]  Paul R. Cohen,et al.  Does Prior Knowledge Facilitate the Development of Knowledge-based Systems? , 1999, AAAI/IAAI.