A redundancy-based method for the extraction of relation instances from the Web

The Semantic Web requires automatic ontology population methods. We developed an approach, that given existing ontologies, extracts instances of ontology relations, a specific subtask of ontology population. We use generic, domain-independent techniques to extract candidate relation instances from the Web and exploit the redundancy of information on the Web to compensate for loss of precision caused by the use of these generic methods. The candidate relation instances are then ranked based on co-occurrence with a small seed set. In an experiment, we extracted instances of the relation between artists and art styles. The results were manually evaluated against selected art resources. The method was also tested in the football domain. We also compare the performance of our ranking to that of a Google-hit count-based method.

[1]  Doug Downey,et al.  A Probabilistic Model of Redundancy in Information Extraction , 2005, IJCAI.

[2]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[3]  Nicholas Kushmerick,et al.  Wrapper Induction for Information Extraction , 1997, IJCAI.

[4]  E. Lehmann,et al.  Nonparametrics: Statistical Methods Based on Ranks , 1976 .

[5]  V. de Boer,et al.  Instance Classification using Co-Occurrences on the Web , 2006 .

[6]  Jan H. M. Korst,et al.  Automatic Ontology Population by Googling , 2005, BNAIC.

[7]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[8]  Ralph Grishman,et al.  Bootstrapped Learning of Semantic Classes from Positive and Negative Examples , 2003 .

[9]  Steffen Staab,et al.  Ontology Learning for the Semantic Web , 2002, IEEE Intell. Syst..

[10]  Doug Downey,et al.  Unsupervised named-entity extraction from the Web: An experimental study , 2005, Artif. Intell..

[11]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[12]  Sergey Brin,et al.  Extracting Patterns and Relations from the World Wide Web , 1998, WebDB.

[13]  Anjo Anjewierden,et al.  Task and domain ontologies for knowledge mapping in operational processes , 2004 .

[14]  Valter Crescenzi,et al.  Automatic information extraction from large websites , 2004, JACM.

[15]  Steffen Staab,et al.  Learning Taxonomic Relations from Heterogeneous Evidence , 2004 .

[16]  Zhisheng Huang,et al.  MultimediaN E-Culture Demonstrator , 2006, International Semantic Web Conference.

[17]  Alexiei Dingli,et al.  Learning to Harvest Information for the Semantic Web , 2004, ESWS.

[18]  Luis Gravano,et al.  Snowball: extracting relations from large plain-text collections , 2000, DL '00.

[19]  Paul M. B. Vitányi,et al.  Automatic Meaning Discovery Using Google , 2006, Kolmogorov Complexity and Applications.