Predictive Modeling of Total Healthcare Costs Using Pharmacy Claims Data: A Comparison of Alternative Econometric Cost Modeling Techniques

Objective:We sought to evaluate several statistical modeling approaches in predicting prospective total annual health costs (medical plus pharmacy) of health plan participants using Pharmacy Health Dimensions (PHD), a pharmacy claims-based risk index. Methods:We undertook a 2-year (baseline year/follow-up year) longitudinal analysis of integrated medical and pharmacy claims. Included were plan participants younger than 65 years of age with continuous medical and pharmacy coverage (n = 344,832). PHD drug categories, age, gender, and pharmacy costs were derived across the baseline year. Annual total health costs were calculated for each plan participant in follow-up year. Models examined included ordinary least squares (OLS) regression, log-transformed OLS regression with smearing estimator, and 3 two-part models using OLS regression, log-OLS regression with smearing estimator, and generalized linear modeling (GLM), respectively. A 10% random sample was withheld for model validation, which was assessed via adjusted r2, mean absolute prediction error, specificity, and positive predictive value. Results:Most PHD drug categories were significant independent predictors of total costs. Among models tested, the OLS model had the lowest mean absolute prediction error and highest adjusted r2. The log-OLS and 2-part log-OLS models did not predict costs accurately as the result of issues of log-scale heteroscedasticity. The 2-part model using GLM had lower adjusted r2 but similar performance in other assessment measures compared with the OLS or 2-part OLS models. Conclusion:The PHD system derived solely from pharmacy claims data can be used to predict future total health costs. Using PHD with a simple OLS model may provide similar predictive accuracy in comparison to more advanced econometric models.

[1]  Michael J. Goodman,et al.  Chronic Disease Score as a Predictor of Hospitalization , 2002, Epidemiology.

[2]  M C Hornbrook,et al.  Modeling risk using generalized linear models. , 1999, Journal of health economics.

[3]  E H Wagner,et al.  A chronic disease score from automated pharmacy data. , 1992, Journal of clinical epidemiology.

[4]  J. Mullahy Much Ado About Two: Reconsidering Retransformation and the Two-Part Model in Health Economics , 1998, Journal of health economics.

[5]  A. Albert,et al.  On the existence of maximum likelihood estimates in logistic regression models , 1984 .

[6]  Paul Grootendorst,et al.  Coding accuracy of administrative drug claims in the Ontario Drug Benefit database. , 2003, The Canadian journal of clinical pharmacology = Journal canadien de pharmacologie clinique.

[7]  G. Nichols,et al.  Replicating the chronic disease score (CDS) from automated pharmacy data. , 1994, Journal of clinical epidemiology.

[8]  S. Weisberg,et al.  Diagnostics for heteroscedasticity in regression , 1983 .

[9]  W. Manning,et al.  Estimating Log Models: To Transform or Not to Transform? , 1999, Journal of health economics.

[10]  T G Ganiats,et al.  The Medicaid Rx Model: Pharmacy-Based Risk Adjustment for Public Programs , 2001, Medical care.

[11]  N. Duan Smearing Estimate: A Nonparametric Retransformation Method , 1983 .

[12]  T. Breurch,et al.  A simple test for heteroscedasticity and random coefficient variation (econometrica vol 47 , 1979 .

[13]  Improving Risk Adjustment for Medicare Capitated Reimbursement Using Nonlinear Models , 2003, Medical care.

[14]  D. Malone,et al.  Development of a chronic disease indicator score using a Veterans Affairs Medical Center medication database. IMPROVE Investigators. , 1999, Journal of clinical epidemiology.

[15]  Anne E Sales,et al.  Predicting Costs of Care Using a Pharmacy-Based Measure Risk Adjustment in a Veteran Population , 2003, Medical care.

[16]  L. Lamers Pharmacy costs groups: a risk-adjuster for capitation payments based on the use of prescribed drugs. , 1999, Medical care.

[17]  Melinda Beeuwkes Buntin,et al.  Too much ado about two-part models and transformation? Comparing methods of modeling Medicare expenditures. , 2004, Journal of health economics.

[18]  W. Manning,et al.  The logged dependent variable, heteroscedasticity, and the retransformation problem. , 1998, Journal of health economics.

[19]  R. Park Estimation with Heteroscedastic Error Terms , 1966 .

[20]  D. Price,et al.  Database studies in asthma pharmacoeconomics: uses, limitations and quality markers , 2003, Expert opinion on pharmacotherapy.

[21]  Anne E Sales,et al.  Construction and Characteristics of the RxRisk-V: A VA-Adapted Pharmacy-Based Case-mix Instrument , 2003, Medical care.

[22]  Gregory E. Simon,et al.  A Chronic Disease Score with Empirically Derived Weights , 1995, Medical care.

[23]  P. Fishman,et al.  Development and estimation of a pediatric chronic disease score using automated pharmacy data. , 1999, Medical care.

[24]  J Z Ayanian,et al.  Measuring population health risks using inpatient diagnoses and outpatient pharmacy data. , 2001, Health services research.

[25]  Paul A. Fishman,et al.  Risk Adjustment Using Automated Ambulatory Pharmacy Data: The RxRisk Model , 2003, Medical care.