A system for ‘tagging’ words with their part-of-speech (POS) tags is constructed. The system has two components: a lexicon containing the set of possible POS tags for a given word, and rules which use a word's context to eliminate possible tags for a word. The Inductive Logic Programming (ILP) system Progol is used to induce these rules in the form of definite clauses. The final theory contained 885 clauses. For background knowledge, Progol uses a simple grammar, where the tags are terminals and predicates such as nounp (noun phrase) are non-terminals. Progol was altered to allow the caching of information about clauses generated during the induction process which greatly increased efficiency. The system achieved a per-word accuracy of 96.4% on known words drawn from sentences without quotation marks. This is on a par with other tagging systems induced from the same data [5, 2, 4] which all have accuracies in the range 96–97%. The per-sentence accuracy was 4 49.5%.
[1]
Atro Voutilainen,et al.
Inducing constraint grammars
,
1996,
ICGI.
[2]
Lawrence R. Rabiner,et al.
A tutorial on hidden Markov models and selected applications in speech recognition
,
1989,
Proc. IEEE.
[3]
Atro Voutilainen,et al.
Tagging accurately - Don't guess if you know
,
1994,
ANLP.
[4]
Steven Abney,et al.
Part-of-Speech Tagging and Partial Parsing
,
1997
.
[5]
Eric Brill,et al.
Some Advances in Transformation-Based Part of Speech Tagging
,
1994,
AAAI.
[6]
Walter Daelemans,et al.
MBT: A Memory-Based Part of Speech Tagger-Generator
,
1996,
VLC@COLING.
[7]
Penelope Sibun,et al.
A Practical Part-of-Speech Tagger
,
1992,
ANLP.