Evaluation of NLP systems

Computational linguistics as a science has had its evaluation methods since its early days: A concordance program can be evaluated according to its ability to find all occurrences, to list them properly, to have a flexible user interface etc., frequency programs *nay be evaluated according to their statistics, the possibility of lemmatisation, parsers are evaluated according to their efficiency etc. When we contemplate one component at a time and want a technical evaluation, we normally have no problem defining the evaluation criteria.