The TIPSTEPdSHOGUN Project

This paper presents an overview of the TIPSTER/SHOGUN project, the major results, and the SHOGUN data extraction system. TIPSTER/SHOGUN was a joint effort of GE Corporate Research and Development, Carnegie Mellon University, and Martin Marietta Management and Data Systems (formerly GE Aerospace), part of the ARPA TIPSTER Text program. Two of the main technical thrusts of the project were: (1) the development of a model of finite-state approximation, in which the accuracy of more detailed models of language interpretation could be realized in a simple, efficient framework, and (2) (3) experiments in automated knowledge acquisition, to minimize customization and ease the tuning and extension of the system. Innovations in each of these areas allowed the project to meet its goal of achieving advances in coverage and accuracy while showing consistently good performance across languages and domains.