The CLAWS Web Tagger
暂无分享,去创建一个
Part-of-speech (POS) tagging, also called grammatical tagging, is the commonest form of corpus annotation, and was the first form of annotation to be developed by UCREL (University Centre for Computer Corpus Research on Language) at Lancaster. Our POS tagging software for English text, CLAWS (the Constituent Likelihood Automatic Word-tagging System), has been continuously developed since the early 1980s. The latest version of the tagger, CLAWS4, was used to POS tag 100 million words of the British National Corpus (BNC); see Garside (1996). Several changes to the tagger were carried out during our work with the BNC. Tagset independence in the software was added as two tagsets were used in the BNC:
[1] Roger Garside,et al. A hybrid grammatical tagger: CLAWS4 , 1997 .
[2] Roger Garside. The robust tagging of unrestricted text: the BNC experience , 1996 .
[3] Geoffrey Leech,et al. Using corpora for language research : studies in the honour of Geoffrey Leech , 1996 .