Accurate and Interpretable Machine Learning for Transparent Pricing of Health Insurance Plans

Health insurance companies cover half of the United States population through commercial employer-sponsored health plans and pay 1.2 trillion US dollars every year to cover medical expenses for their members. The actuary and underwriter roles at a health insurance company serve to assess which risks to take on and how to price those risks to ensure profitability of the organization. While Bayesian hierarchical models are the current standard in the industry to estimate risk, interest in machine learning as a way to improve upon these existing methods is increasing. Lumiata, a healthcare analytics company, ran a study with a large health insurance company in the United States. We evaluated the ability of machine learning models to predict the per member per month cost of employer groups in their next renewal period, especially those groups who will cost less than 95\% of what an actuarial model predicts (groups with "concession opportunities"). We developed a sequence of two models, an individual patient-level and an employer-group-level model, to predict the annual per member per month allowed amount for employer groups, based on a population of 14 million patients. Our models performed 20\% better than the insurance carrier's existing pricing model, and identified 84\% of the concession opportunities. This study demonstrates the application of a machine learning system to compute an accurate and fair price for health insurance products and analyzes how explainable machine learning models can exceed actuarial models' predictive accuracy while maintaining interpretability.

[1]  Edward W. Frees,et al.  Summarizing Insurance Scores Using a Gini Index , 2011 .

[2]  T. Buchmueller,et al.  Will employers drop health insurance coverage because of the Affordable Care Act? , 2013, Health affairs.

[3]  Tie-Yan Liu,et al.  LightGBM: A Highly Efficient Gradient Boosting Decision Tree , 2017, NIPS.

[4]  Lucas Janson,et al.  Predicting patient ‘cost blooms’ in Denmark: a longitudinal population-based study , 2017, BMJ Open.

[5]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[6]  John V. Guttag,et al.  Introduction to Computation and Programming Using Python: With Application to Understanding Data , 2016 .

[7]  Wei Qian,et al.  Tweedie gradient boosting for extremely unbalanced zero-inflated data , 2018, Commun. Stat. Simul. Comput..

[8]  Brian W. Powers,et al.  Dissecting racial bias in an algorithm used to manage the health of populations , 2019, Science.

[9]  Joseph White The Tax Exclusion for Employer-Sponsored Insurance Is Not Regressive-But What Is It? , 2017, Journal of health politics, policy and law.

[10]  M. Banerjee,et al.  Out-of-pocket costs are on the rise for commonly prescribed neurologic medications , 2019, Neurology.

[11]  Cynthia Rudin,et al.  Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead , 2018, Nature Machine Intelligence.

[12]  D. Powers Evaluation: From Precision, Recall and F-Factor to ROC, Informedness, Markedness & Correlation , 2008 .

[13]  Jeffrey Dean,et al.  Scalable and accurate deep learning with electronic health records , 2018, npj Digital Medicine.

[14]  David A. Sontag,et al.  Population-Level Prediction of Type 2 Diabetes From Claims Data and Analysis of Risk Factors , 2015, Big Data.