NLP Techniques for Term Extraction and Ontology Population

This chapter investigates NLP techniques for ontology population, using a combination of rule-based approaches and machine learning. We describe a method for term recognition using linguistic and statistical techniques, making use of contextual information to bootstrap learning. We then investigate how term recognition techniques can be useful for the wider task of information extraction, making use of similarity metrics and contextual information. We describe two tools we have developed which make use of contextual information to help the development of rules for named entity recognition. Finally, we evaluate our ontology-based information extraction results using a novel technique we have developed which makes use of similarity-based metrics first developed for term recognition.

[1]  D. Maynard Term recognition using combined knowledge sources , 1999 .

[2]  Kalina Bontcheva,et al.  Automatic Language-Independent Induction of Gazetteer Lists , 2004, LREC.

[3]  M S Tuttle,et al.  Identifying concepts in medical knowledge. , 1995, Medinfo. MEDINFO.

[4]  Paul A. Kogut,et al.  AeroDAML: Applying Information Extraction to Generate DAML Annotations from Web Pages , 2001, Semannot@K-CAP 2001.

[5]  Udo Hahn,et al.  Towards Text Knowledge Engineering , 1998, AAAI/IAAI.

[6]  Sophia Ananiadou,et al.  The C-value/NC-value domain-independent method for multi-word term extraction , 1999 .

[7]  Diana Maynard,et al.  Benchmarking ontology-based annotation tools for the Semantic Web , 2005 .

[8]  Marc Moens,et al.  Named Entity Recognition without Gazetteers , 1999, EACL.

[9]  Yoram Singer,et al.  Large margin hierarchical classification , 2004, ICML.

[10]  Eneko Agirre,et al.  Word Sense Disambiguation using Conceptual Density , 1996, COLING.

[11]  Gregory Grefenstette,et al.  Explorations in automatic thesaurus discovery , 1994 .

[12]  Sophia Ananiadou,et al.  Identifying Terms by their Family and Friends , 2000, COLING.

[13]  Diana Maynard,et al.  Multilingual adaptations of ANNIE, a reusable information extraction tool , 2003 .

[14]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[15]  Yorick Wilks,et al.  Designing Adaptive Information Extraction for the Semantic Web in Amilcare , 2003 .

[16]  Eneko Agirre,et al.  Combining Unsupervised Lexical Knowledge Methods for Word Sense Disambiguation , 1997, ACL.

[17]  Steffen Staab,et al.  Towards the self-annotating web , 2004, WWW '04.

[18]  Alan F. Smeaton,et al.  Experiments on using semantic distances between words in image caption retrieval , 1996, SIGIR '96.

[19]  Kalina Bontcheva,et al.  Hierarchical, perceptron-like learning for ontology-based information extraction , 2007, WWW '07.

[20]  Didier Bourigault,et al.  Surface Grammatical Analysis for the Extraction of Terminological Noun Phrases , 1992, COLING.

[21]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[22]  Kalina Bontcheva,et al.  Architectural elements of language engineering robustness , 2002, Natural Language Engineering.

[23]  Ramanathan V. Guha,et al.  SemTag and seeker: bootstrapping the semantic web via automated semantic annotation , 2003, WWW '03.

[24]  Sophia Ananiadou,et al.  Term sense disambiguation using a domain-specific thesaurus , 1998, LREC.

[25]  Michael J. Cafarella,et al.  Ontology-Driven Information Extraction with OntoSyphon , 2006, SEMWEB.

[26]  Kalina Bontcheva,et al.  Indexing and querying linguistic metadata and document content , 2007 .

[27]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[28]  Atanas Kiryakov,et al.  Semantic Annotation, Indexing, and Retrieval , 2003, SEMWEB.

[29]  Kalina Bontcheva,et al.  Rapid customization of an information extraction system for a surprise language , 2003, TALIP.

[30]  John Domingue,et al.  Magpie: supporting browsing and navigation on the semantic web , 2004, IUI '04.

[31]  Steffen Staab,et al.  S-CREAM: Semiautomatic CREAtion of Metadata , 2002, SAAKM@ECAI.

[32]  Hitoshi Iida,et al.  Experiments and Prospects of Example-Based Machine Translation , 1991, ACL.

[33]  Kalina Bontcheva,et al.  SVM Based Learning System for Information Extraction , 2004, Deterministic and Statistical Methods in Machine Learning.

[34]  Enrico Motta,et al.  Opening Up Magpie via Semantic Services , 2004, International Semantic Web Conference.

[35]  Kalina Bontcheva,et al.  Multilingual adaptations of a reusable information extraction tool , 2003, EACL.

[36]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.