论文信息 - Part-of-Speech Tagging Using Progol

Part-of-Speech Tagging Using Progol

A system for ‘tagging’ words with their part-of-speech (POS) tags is constructed. The system has two components: a lexicon containing the set of possible POS tags for a given word, and rules which use a word's context to eliminate possible tags for a word. The Inductive Logic Programming (ILP) system Progol is used to induce these rules in the form of definite clauses. The final theory contained 885 clauses. For background knowledge, Progol uses a simple grammar, where the tags are terminals and predicates such as nounp (noun phrase) are non-terminals. Progol was altered to allow the caching of information about clauses generated during the induction process which greatly increased efficiency. The system achieved a per-word accuracy of 96.4% on known words drawn from sentences without quotation marks. This is on a par with other tagging systems induced from the same data [5, 2, 4] which all have accuracies in the range 96–97%. The per-sentence accuracy was 4 49.5%.

James Cussens

[1] Atro Voutilainen,et al. Inducing constraint grammars , 1996, ICGI.

[2] Lawrence R. Rabiner,et al. A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[3] Atro Voutilainen,et al. Tagging accurately - Don't guess if you know , 1994, ANLP.

[4] Steven Abney,et al. Part-of-Speech Tagging and Partial Parsing , 1997 .

[5] Eric Brill,et al. Some Advances in Transformation-Based Part of Speech Tagging , 1994, AAAI.

[6] Walter Daelemans,et al. MBT: A Memory-Based Part of Speech Tagger-Generator , 1996, VLC@COLING.

[7] Penelope Sibun,et al. A Practical Part-of-Speech Tagger , 1992, ANLP.