Hierarchical, perceptron-like learning for ontology-based information extraction

Recent work on ontology-based Information Extraction (IE) has tried to make use of knowledge from the target ontology in order to improve semantic annotation results. However, very few approaches exploit the ontology structure itself, and those that do so, have some limitations. This paper introduces a hierarchical learning approach for IE, which uses the target ontology as an essential part of the extraction process, by taking into account the relations between concepts. The approach is evaluated on the largest available semantically annotated corpus. The results demonstrate clearly the benefits of using knowledge from the ontology as input to the information extraction process. We also demonstrate the advantages of our approach over other state-of-the-art learning systems on a commonly used benchmark dataset.

[1]  Yoram Singer,et al.  Large margin hierarchical classification , 2004, ICML.

[2]  Filippo Neri,et al.  Machine Learning for Information Extraction , 1997, SCIE.

[3]  Dan Roth,et al.  Relational Learning via Propositional Algorithms: An Information Extraction Case Study , 2001, IJCAI.

[4]  Yorick Wilks,et al.  Designing Adaptive Information Extraction for the Semantic Web in Amilcare , 2003 .

[5]  Dayne Freitag,et al.  Boosted Wrapper Induction , 2000, AAAI/IAAI.

[6]  Fabio Ciravegna,et al.  (LP) 2 , an Adaptive Algorithm for Information Extraction from Web-related Texts , 2001 .

[7]  Juho Rousu,et al.  Kernel-Based Learning of Hierarchical Multilabel Classification Models , 2006, J. Mach. Learn. Res..

[8]  Steffen Staab,et al.  Towards the self-annotating web , 2004, WWW '04.

[9]  Ramanathan V. Guha,et al.  SemTag and seeker: bootstrapping the semantic web via automated semantic annotation , 2003, WWW '03.

[10]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[11]  Diana Maynard,et al.  Metrics for Evaluation of Ontology-based Information Extraction , 2006, EON@WWW.

[12]  John Shawe-Taylor,et al.  The Perceptron Algorithm with Uneven Margins , 2002, ICML.

[13]  Michael J. Cafarella,et al.  Ontology-Driven Information Extraction with OntoSyphon , 2006, SEMWEB.

[14]  Claudio Gentile,et al.  Incremental Algorithms for Hierarchical Classification , 2004, J. Mach. Learn. Res..

[15]  Juho Rousu,et al.  Learning hierarchical multi-category text classification models , 2005, ICML.

[16]  Kalina Bontcheva,et al.  Using Uneven Margins SVM and Perceptron for Information Extraction , 2005, CoNLL.

[17]  Arthur Stutt,et al.  MnM: Ontology Driven Semi-automatic and Automatic Support for Semantic Markup , 2002, EKAW.

[18]  Dayne Freitag,et al.  Machine Learning for Information Extraction in Informal Domains , 2000, Machine Learning.

[19]  Hwee Tou Ng,et al.  A maximum entropy approach to information extraction from semi-structured and free text , 2002, AAAI/IAAI.

[20]  John Domingue,et al.  Magpie: supporting browsing and navigation on the semantic web , 2004, IUI '04.

[21]  Atanas Kiryakov,et al.  Semantic Annotation, Indexing, and Retrieval , 2003, SEMWEB.

[22]  Andrew McCallum,et al.  Information Extraction with HMMs and Shrinkage , 1999 .

[23]  Paul A. Kogut,et al.  AeroDAML: Applying Information Extraction to Generate DAML Annotations from Web Pages , 2001, Semannot@K-CAP 2001.

[24]  Steffen Staab,et al.  S-CREAM: Semiautomatic CREAtion of Metadata , 2002, SAAKM@ECAI.

[25]  Nancy Chinchor,et al.  MUC-4 evaluation metrics , 1992, MUC.

[26]  M. Cali,et al.  Relational learning techniques for natural language information extraction , 1998 .