Towards systematic grammar profiling.Test suite technology 10 years after

Abstract An experiment with recent test suite and grammar (engineering) resources is outlined: a criticial assessment of the EU-funded tsnlp (Test Suites for Natural Language Processing) package as a diagnostic and benchmarking facility for a distributed (multi-site) large-scale hpsg grammar engineering effort. This paper argues for a generalized, systematic, and fully automated testing and diagnosis facility as an integral part of the linguistic engineering cycle and gives a practical assessment of existing resources; both a flexible methodology and tools for competence and performance profiling are presented. By comparison to earlier evaluation work as reflected in the Hewlett-Packard test suite data, released exactly 10 years before tsnlp , it is judged where test-suite-based evaluation has improved (and where not) over time.