Automatic acquisition of sublanguage semantic schema: towards the word sense disambiguation of clinical narratives.

Natural language processing of clinical notes is challenging due to a high degree of semantic ambiguity. Previous research has uncovered ways to improve disambiguation accuracy using manually created rules of semantic sentence structure. However, applying a natural language processing system in a new clinical domain using this method is very labor intensive. This paper presents an automatic method of developing such disambiguation rules for a wide range of clinical domains. Our rules are based on the co-occurrence patterns of semantic types of terms unambiguously mapped to UMLS concepts by MetaMap. These patterns are combined into a sublanguage semantic schema that can be used by an existing natural language processing system such as MetaMap. The differences of co-occurrence patterns across clinical notes of different domains are presented here as evidence of clinical sublanguages.