Learning High Precision Rules to Make Predictions of Morbidities in Discharge Summaries

The Duluth entries in the 2008 I2B2 Obesity Challenge used supervised machine learning techniques that relied on bag of words unigram features found in discharge summaries to predict if a patient is obese or suffers from any of 15 related co‐morbidities. We found that the RIPPER rule learning algorithm created high precision models that exceed the mean precision of all the participating systems by a significant degree. It also discovers simple and informative rules that allow us to better understand the domain. However, no supervised learning algorithm that we experimented with was able to perform well on the minority judgments which make up less than 1% of the total training and test data.