A Performance Evaluation of Text-Analysis Technologies

A performance evaluation of 15 text-analysis systems was recently conducted to realistically assess the state of the art for detailed information extraction from unconstrained continuous text. Reports associated with terrorism were chosen as the target domain, and all systems were tested on a collection of previously unseen texts released by a government agency. Based on multiple strategies for computing each metric, the competing systems were evaluated for recall, precision, and overgeneration. The results support the claim that systems incorporating natural language‐processing techniques are more effective than systems based on stochastic techniques alone. A wide range of language-processing strategies was employed by the top-scoring systems, indicating that many natural language‐processing techniques provide a viable foundation for sophisticated text analysis. Further evaluation is needed to produce a more detailed assessment of the relative merits of specific technologies and establish true performance limits for automated information extraction.