The CLAWS Web Tagger

Part-of-speech (POS) tagging, also called grammatical tagging, is the commonest form of corpus annotation, and was the first form of annotation to be developed by UCREL (University Centre for Computer Corpus Research on Language) at Lancaster. Our POS tagging software for English text, CLAWS (the Constituent Likelihood Automatic Word-tagging System), has been continuously developed since the early 1980s. The latest version of the tagger, CLAWS4, was used to POS tag 100 million words of the British National Corpus (BNC); see Garside (1996). Several changes to the tagger were carried out during our work with the BNC. Tagset independence in the software was added as two tagsets were used in the BNC: