Ontology-Driven Information Extraction with OntoSyphon

The Semantic Web’s need for machine understandable content has led researchers to attempt to automatically acquire such content from a number of sources, including the web. To date, such research has focused on “document-driven” systems that individually process a small set of documents, annotating each with respect to a given ontology. This paper introduces OntoSyphon, an alternative that strives to more fully leverage existing ontological content while scaling to extract comparatively shallow content from millions of documents. OntoSyphon operates in an “ontology-driven” manner: taking any ontology as input, OntoSyphon uses the ontology to specify web searches that identify possible semantic instances, relations, and taxonomic information. Redundancy in the web, together with information from the ontology, is then used to automatically verify these candidate instances and relations, enabling OntoSyphon to operate in a fully automated, unsupervised manner. A prototype of OntoSyphon is fully implemented and we present experimental results that demonstrate substantial instance learning in a variety of domains based on independently constructed ontologies. We also introduce new methods for improving instance verification, and demonstrate that they improve upon previously known techniques.

[1]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[2]  Steffen Staab,et al.  Gimme' the context: context-driven automatic semantic annotation with C-PANKOW , 2005, WWW '05.

[3]  Ramanathan V. Guha,et al.  SemTag and seeker: bootstrapping the semantic web via automated semantic annotation , 2003, WWW '03.

[4]  Steffen Staab,et al.  S-CREAM: Semiautomatic CREAtion of Metadata , 2002, SAAKM@ECAI.

[5]  Ramanathan V. Guha,et al.  Semantic search , 2003, WWW '03.

[6]  Oren Etzioni,et al.  Extracting Product Features and Opinions from Reviews , 2005, HLT.

[7]  Lucy Vanderwende,et al.  MindNet: Acquiring and Structuring Semantic Information from Text , 1998, COLING-ACL.

[8]  Atanas Kiryakov,et al.  Semantic annotation, indexing, and retrieval , 2004, J. Web Semant..

[9]  Udo Hahn,et al.  Towards Text Knowledge Engineering , 1998, AAAI/IAAI.

[10]  Douglas B. Lenat,et al.  Gathering and Managing Facts for Intelligence Analysis , 2005 .

[11]  Dominic Widdows,et al.  Using LSA and Noun Coordination Information to Improve the Recall and Precision of Automatic Hyponymy Extraction , 2003, CoNLL.

[12]  Stephen Soderland,et al.  Learning to Extract Text-Based Information from the World Wide Web , 1997, KDD.

[13]  Steffen Staab,et al.  Ontology Learning Part One - On Discoverying Taxonomic Relations from the Web , 2002 .

[14]  Tom M. Mitchell,et al.  Learning to construct knowledge bases from the World Wide Web , 2000, Artif. Intell..

[15]  Michael J. Witbrock,et al.  Searching for Common Sense: Populating Cyc™ from the Web , 2005, AAAI.

[16]  Suresh Manandhar,et al.  Improving an Ontology Refinement Method with Hyponymy Patterns , 2002, LREC.

[17]  Daniel Jurafsky,et al.  Learning Syntactic Patterns for Automatic Hypernym Discovery , 2004, NIPS.

[18]  Oren Etzioni,et al.  A search engine for natural language applications , 2005, WWW '05.

[19]  Doug Downey,et al.  KnowItNow: Fast, Scalable Information Extraction from the Web , 2005, HLT.

[20]  Alexiei Dingli,et al.  Armadillo: harvesting information for the semantic web , 2004, SIGIR '04.

[21]  Doug Downey,et al.  Unsupervised named-entity extraction from the Web: An experimental study , 2005, Artif. Intell..

[22]  Doug Downey,et al.  A Probabilistic Model of Redundancy in Information Extraction , 2005, IJCAI.

[23]  Luis Gravano,et al.  Extracting Relations from Large Plain-Text Collections , 1999 .

[24]  Enrico Motta,et al.  The Semantic Web - ISWC 2005, 4th International Semantic Web Conference, ISWC 2005, Galway, Ireland, November 6-10, 2005, Proceedings , 2005, SEMWEB.

[25]  Steffen Staab,et al.  Learning Taxonomic Relations from Heterogeneous Evidence , 2004 .

[26]  Johanna Völker,et al.  A Framework for Ontology Learning and Data-driven Change Discovery , 2005 .

[27]  I. V. Ramakrishnan,et al.  OntoMiner: Bootstrapping and Populating Ontologies from Domain-Specific Web Sites , 2003, IEEE Intell. Syst..

[28]  Luis Gravano,et al.  Snowball: extracting relations from large plain-text collections , 2000, DL '00.

[29]  Steffen Staab,et al.  Learning Concept Hierarchies from Text Corpora using Formal Concept Analysis , 2005, J. Artif. Intell. Res..

[30]  Kristina Lerman,et al.  Populating the Semantic Web , 2004 .

[31]  Maria Vargas-Vera,et al.  Ontosophie: A Semi-Automatic System for Ontology Population from Text , 2004 .

[32]  Willem Robert van Hage,et al.  A Method to Combine Linguistic Ontology-Mapping Techniques , 2005, SEMWEB.

[33]  Eduard Hovy,et al.  Towards terascale knowledge acquisition , 2004, COLING 2004.

[34]  Ian Horrocks,et al.  Ontologies and the semantic web , 2008, CACM.