Analysis of Polarity Information in Medical Text

Knowing the polarity of clinical outcomes is important in answering questions posed by clinicians in patient treatment. We treat analysis of this information as a classification problem. Natural language processing and machine learning techniques are applied to detect four possibilities in medical text: no outcome, positive outcome, negative outcome, and neutral outcome. A supervised learning method is used to perform the classification at the sentence level. Five feature sets are constructed: unigrams, bigrams, change phrases, negations, and categories. The performance of different combinations of feature sets is compared. The results show that generalization using the category information in the domain knowledge base Unified Medical Language System is effective in the task. The effect of context information is significant. Combining linguistic features and domain knowledge leads to the highest accuracy.