Corpus Annotation and Usable Linguistic Features

Over the past 50 years of corpus linguistics since the Brown Corpus, the development of corpus annotation can be described as a process of gradual sophistication, one that is facilitated by the rapid development of computer technologies and also, more importantly, one that is propelled by the need for increasingly fine granularity of linguistic analysis for better descriptive insight into language. The need for fine-grained analysis at the same time is largely driven by the popularity of ubiquitous computing and intelligent computer software that attempts to model human intelligence and mimic human behaviour based on useful features made available by annotations of different types. Man–machine dialogue systems, as an example, perform at a high level of linguistic sophistication that draws from annotations on the basis of lexis, grammar, semantics and speech processing. Having said this, in corpus linguistics, there remains a debate over the necessity of corpus annotation. While some scholars believe that corpus annotation provides added value to linguistic corpora, others believe that corpus annotation can be harmful to linguistic insight.