A proposal for improving the measurement of parse accuracy

Widespread dissatisfaction has been expressed with the measure of parse accuracy used in the Parseval programme, based on the location of constituent boundaries. Scores on the Parseval metric are perceived as poorly correlated with intuitive judgments of goodness of parse; the metric applies only to a restricted range of grammar formalisms; and it is seen as divorced from applications of NLP technology. The present paper defines an alternative metric, which measures the accuracy with which successive words are fitted into parsetrees. (The original statement of this metric is believed to have been the earliest published proposal about quantifying parse accuracy.) The metric defined here gives overall scores that quantify intuitive concepts of good and bad parsing relatively directly, and it gives scores for individual words which enable the location of parsing errors to be pinpointed. It applies to a wider range of grammar formalisms, and is tunable for specific parsing applications.