An Unsupervised Method for Ontology Population from the Web

Knowledge engineers have had difficulty in automatically constructing and populating domain ontologies, mainly due to the well-known knowledge acquisition bottleneck. In this paper, we attempt to alleviate this problem by proposing an iterative unsupervised approach to identifying and extracting ontological class instances from the Web. The proposed approach considers the Web as a big corpus and relies on a confidence-weighted metric based on semantic measures and web-scale statistics as types of evidence. Moreover, our iterative method is able to learn, to some extent, domain-specific linguistic patterns for extracting ontological class instances. We obtained encouraging results for the final ranking of candidate instances as well as an accuracy performance up to 97% for the patterns found by our method.

[1]  Evandro Costa,et al.  A Confidence-Weighted Metric for Unsupervised Ontology Population from Web Texts , 2012, DEXA.

[2]  G Stix,et al.  The mice that warred. , 2001, Scientific American.

[3]  Dejing Dou,et al.  Ontology-based information extraction: An introduction and a survey of current approaches , 2010, J. Inf. Sci..

[4]  Philipp Cimiano,et al.  Ontology learning and population from text - algorithms, evaluation and applications , 2006 .

[5]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[6]  Doug Downey,et al.  Learning text patterns for web information extraction and assessment , 2004, AAAI 2004.

[7]  Wenfei Fan,et al.  Keys with Upward Wildcards for XML , 2001, DEXA.

[8]  Doug Downey,et al.  Web-scale information extraction in knowitall: (preliminary results) , 2004, WWW '04.

[9]  Michael J. Cafarella,et al.  Ontology-driven, unsupervised instance population , 2008, J. Web Semant..

[10]  Dejing Dou,et al.  Ontology-based information extraction , 2010 .

[11]  Ted Pedersen,et al.  Information Content Measures of Semantic Similarity Perform Better Without Sense-Tagged Text , 2010, NAACL.

[12]  Jan Korst,et al.  Learning Effective Surface Text Patterns for Information Extraction , 2006, Workshop On Adaptive Text Extraction And Mining ATEM.

[13]  James A. Hendler,et al.  The Semantic Web" in Scientific American , 2001 .

[14]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[15]  Martha Palmer,et al.  Verb Semantics and Lexical Selection , 1994, ACL.