Syntactically-informed semantic category recognition in discharge summaries.

Semantic category recognition (SCR) contributes to document understanding. Most approaches to SCR fail to make use of syntax. We hypothesize that syntax, if represented appropriately, can improve SCR. We present a statistical semantic category (SC) recognizer trained with syntactic and lexical contextual clues, as well as ontological information from UMLS, to identify eight semantic categories in discharge summaries. Some of our categories, e.g., test results and findings, include complex entries that span multiple phrases. We achieve classification F-measures above 90% for most categories and show that syntactic context is important for SCR.