Early Detection of Diabetes from Health Claims

Early detection of Type 2 diabetes poses challenges to both the machine learning and medical communities. Current clinical practices focus on narrow patientspecific courses of action whereas electronic health records and insurance claims data give us the ability to generalize that knowledge across large sets of populations. Advances in population health care have the potential to improve the quality of health of the patient as well as decrease future medical costs, at least in part by prevention of long-term complications accruing during undiagnosed diabetes. Based on patient data from insurance claims, we present the results of our initial experiments into identification of patients who will develop diabetes. We motivate future work in this area by considering the need to develop machine learning algorithms that can effectively deal with the depth and the variety of the data.