IndexFinder : A Knowledge-based Method for Indexing Clinical Texts

Extracting key concepts from clinical texts for indexing is an important task in implementing a medical digital library. Several methods are proposed in the literature for mapping free text into terms controlled by the Unified Medical Language System (UMLS). They are, however, not appropriate for building a fast online application. MatMap and other methods use natural language processing (NLP) techniques to map identified noun phrases into concepts. We present a new algorithm for efficiently generating all possible UMLS phrases in a text from which key concepts are identified by using syntactic and semantic filtering. We have implemented the algorithm as a web-based service that provides a search interface for researchers and computer programs. During preliminary manual examinations of the 456 concepts for 100 topic sentences, we noticed that our method has discovered 18 (4%) more phrases that are not obtained from one single noun phrase, and no improper combinations are in the results. Our empirical experiment shows that the algorithm is effective at discovering relevant UMLS concepts while achieving a throughput of 43K bytes text per second. The tool can extract key concepts from clinical texts for indexing.