Grammatical Inference in Practice: A Case Study in the Biomedical Domain

In this paper we discuss an approach to named entity recognition (NER) based on grammatical inference (GI). Previous GI approaches have aimed at constructing a grammar underlying a given text source. It has been noted that the rules produced by GI can also be interpreted semantically [16] where a non-terminal describes interchangeable elements which are the instances of the same concepts. Such an observation leads to the hypothesis that GI might be useful for finding concept instances in a text. Furthermore, it should also be possible to discover relations between concepts, or more precisely, the way such relations are expressed linguistically. Throughout the paper, we propose a general framework for using GI for named entity recognition by discussing several possible approaches. In addition, we demonstrate that these methods successfully work on biomedical data using an existing GI tool.

[1]  Dayne Freitag,et al.  Using grammatical inference to improve precision in information extraction , 1997, ICML 1997.

[2]  Menno van Zaanen,et al.  Computational Grammar Induction for Linguists , 2004, Grammars.

[3]  Georgios Paliouras,et al.  Navigation , 2022 .

[4]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[5]  Walter Daelemans,et al.  Automatic Initiation of an Ontology , 2004, CoopIS/DOA/ODBASE.

[6]  Martin Romacker,et al.  An Integrated Model of Semantic and Conceptual Interpretation from Dependency Structures , 2022 .

[7]  Georgios Paliouras,et al.  Combining Information Extraction Systems Using Voting and Stacked Generalization , 2005, J. Mach. Learn. Res..

[8]  Mark Craven,et al.  Constructing Biological Knowledge Bases by Extracting Information from Text Sources , 1999, ISMB.

[9]  Eytan Ruppin,et al.  Motif extraction and protein classification , 2005, 2005 IEEE Computational Systems Bioinformatics Conference (CSB'05).

[10]  Daniel Jurafsky,et al.  Semantic Role Chunking Combining Complementary Syntactic Views , 2005, CoNLL.

[11]  Pieter W. Adriaans,et al.  Learning Relations from Biomedical Corpora Using Dependency Tree Levels , 2006 .

[12]  Robert Meersman,et al.  On the Move to Meaningful Internet Systems 2004: CoopIS, DOA, and ODBASE , 2004, Lecture Notes in Computer Science.

[13]  Eytan Ruppin,et al.  Automatic Acquisition and Efficient Representation of Syntactic Structures , 2002, NIPS.

[14]  Menno van Zaanen,et al.  Alignment-based learning versus emile: A comparison , 2001 .

[15]  Ellen Riloff,et al.  A Bootstrapping Method for Learning Semantic Lexicons using Extraction Pattern Contexts , 2002, EMNLP.

[16]  Pieter W. Adriaans,et al.  Learning Relations from Biomedical Corpora Using Dependency Trees , 2006, KDECB.

[17]  Arthur Stutt,et al.  Engineering Knowledge in the Age of the Semantic Web , 2004, Lecture Notes in Computer Science.

[18]  Emmanuel Cartier,et al.  Use of Ontologies for Cross-lingual Information Management in the Web , 2003 .

[19]  Nigel Collier,et al.  Introduction to the Bio-entity Recognition Task at JNLPBA , 2004, NLPBA/BioNLP.

[20]  Andrew Roberts,et al.  The use of corpora for automatic evaluation of grammar inference systems , 2003 .

[21]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[22]  George A. Vouros,et al.  Enhancing Ontological Knowledge Through Ontology Population and Enrichment , 2004, EKAW.