Cost-of-illness studies based on massive data: a prevalence-based, top-down regression approach

Despite the increasing availability of routine data, no analysis method has yet been presented for cost-of-illness (COI) studies based on massive data. We aim, first, to present such a method and, second, to assess the relevance of the associated gain in numerical efficiency. We propose a prevalence-based, top-down regression approach consisting of five steps: aggregating the data; fitting a generalized additive model (GAM); predicting costs via the fitted GAM; comparing predicted costs between prevalent and non-prevalent subjects; and quantifying the stochastic uncertainty via error propagation. To demonstrate the method, it was applied to aggregated data in the context of chronic lung disease to German sickness funds data (from 1999), covering over 7.3 million insured. To assess the gain in numerical efficiency, the computational time of the innovative approach has been compared with corresponding GAMs applied to simulated individual-level data. Furthermore, the probability of model failure was modeled via logistic regression. Applying the innovative method was reasonably fast (19 min). In contrast, regarding patient-level data, computational time increased disproportionately by sample size. Furthermore, using patient-level data was accompanied by a substantial risk of model failure (about 80 % for 6 million subjects). The gain in computational efficiency of the innovative COI method seems to be of practical relevance. Furthermore, it may yield more precise cost estimates.

[1]  M C Hornbrook,et al.  Modeling risk using generalized linear models. , 1999, Journal of health economics.

[2]  D. Mannino,et al.  International variation in the prevalence of COPD (The BOLD Study): a population-based prevalence study , 2007, The Lancet.

[3]  Linda C. Li,et al.  The economic burden associated with osteoarthritis, rheumatoid arthritis, and hypertension: a comparative study , 2004, Annals of the rheumatic diseases.

[4]  Max Tegmark,et al.  Karhunen-Loève Eigenvalue Problems in Cosmology: How Should We Tackle Large Data Sets? , 1996, astro-ph/9603021.

[5]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[6]  D. Torgerson,et al.  Cost of illness studies , 2000, BMJ : British Medical Journal.

[7]  B. Stollenwerk,et al.  Limited economic evidence of carotid artery stenosis diagnosis and treatment: a systematic review. , 2012, European journal of vascular and endovascular surgery : the official journal of the European Society for Vascular Surgery.

[8]  C. Wenig The impact of BMI on direct costs in Children and Adolescents: empirical findings for the German Healthcare System based on the KiGGS-study , 2012, The European Journal of Health Economics.

[9]  J. Moss,et al.  Cost-of-Illness Studies , 2011, PharmacoEconomics.

[10]  A. Gulsvik,et al.  Excessive costs of COPD in ever-smokers. A longitudinal community study. , 2011, Respiratory medicine.

[11]  Joshua Zhexue Huang,et al.  Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values , 1998, Data Mining and Knowledge Discovery.

[12]  J. Perlin,et al.  Estimating the economic burden of status epilepticus to the health care system , 2005, Seizure.

[13]  J. Lauridsen,et al.  Cost of dementia: impact of disease progression estimated in longitudinal data , 2003, Scandinavian journal of public health.

[14]  U. Siebert,et al.  The German Coronary Artery Disease Risk Screening Model: Development, Validation, and Application of a Decision-Analytic Model for Coronary Artery Disease Prevention with Statins , 2009, Medical decision making : an international journal of the Society for Medical Decision Making.

[15]  John Mullahy,et al.  Econometric Modeling of Health Care Costs and Expenditures: A Survey of Analytical Issues and Related Policy Considerations , 2009, Medical care.

[16]  S. Schneeweiss Learning from big health care data. , 2014, The New England journal of medicine.

[17]  S. Stock,et al.  Breast Cancer Attributable Costs in Germany: A Top-Down Approach Based on Sickness Funds Data , 2012, PloS one.

[18]  S. Wood Thin plate regression splines , 2003 .

[19]  U. Siebert,et al.  Accounting for increased non-target-disease-specific mortality in decision-analytic screening models for economic evaluation , 2013, The European Journal of Health Economics.

[20]  M. Redaélli,et al.  Sex differences in treatment patterns of six chronic diseases: an analysis from the German statutory health insurance. , 2008, Journal of women's health.

[21]  K Andersen,et al.  Cost function estimation: the choice of a model to apply to dementia. , 2000, Health economics.

[22]  Anthony O'Hagan,et al.  Review of Statistical Methods for Analysing Healthcare Resources and Costs , 2010, Health economics.

[23]  C. Donaldson,et al.  Cost of illness studies: An aid to decision-making? , 1987 .

[24]  R. Holle,et al.  Direct medical costs of COPD--an excess cost approach based on two population-based studies. , 2012, Respiratory medicine.

[25]  M. Redaélli,et al.  Asthma: prevalence and cost of illness , 2005, European Respiratory Journal.

[26]  S. Wood Generalized Additive Models: An Introduction with R , 2006 .

[27]  Donald Rubin,et al.  Estimating Causal Effects from Large Data Sets Using Propensity Scores , 1997, Annals of Internal Medicine.

[28]  Ramon Gisbert,et al.  Costs of chronic bronchitis and COPD: a 1-year follow-up study. , 2003, Chest.

[29]  W. Manning,et al.  Estimating Log Models: To Transform or Not to Transform? , 1999, Journal of health economics.

[30]  E. van Doorslaer,et al.  Statistical analysis of cost outcomes in a randomized controlled clinical trial. , 1994, Health Economics.

[31]  S. Evers,et al.  Cost of illness studies in health care: a comparison of two cases. , 1993, Health policy.

[32]  A. Basu,et al.  Estimating marginal and incremental effects on health outcomes using flexible link and variance function models. , 2005, Biostatistics.

[33]  T. Feenstra,et al.  The Burden of Asthma and Chronic Obstructive Pulmonary Disease , 2012, PharmacoEconomics.

[34]  Gregory Piatetsky-Shapiro,et al.  Advances in Knowledge Discovery and Data Mining , 2004, Lecture Notes in Computer Science.

[35]  Anirban Basu,et al.  Generalized Modeling Approaches to Risk Adjustment of Skewed Outcomes Data , 2003, Journal of health economics.

[36]  J C Jager,et al.  Current and future medical costs of asthma and chronic obstructive pulmonary disease in The Netherlands. , 1999, Respiratory medicine.

[37]  C. Mullins,et al.  Cost-of-illness studies : a review of current methods. , 2006, PharmacoEconomics.

[38]  Eli N. Perencevich,et al.  Health and Economic Impact of Surgical Site Infections Diagnosed after Hospital Discharge , 2003, Emerging infectious diseases.

[39]  M. Redaélli,et al.  Diabetes—prevalence and cost of illness in Germany: a study evaluating data from the statutory health insurance in Germany , 2006, Diabetic medicine : a journal of the British Diabetic Association.

[40]  M. Rayner,et al.  The economic burden of coronary heart disease in the UK , 2002, Heart.

[41]  J. Cavanaugh Biostatistics , 2005, Definitions.

[42]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[43]  Eric R. Ziegel,et al.  An Introduction to Generalized Linear Models , 2002, Technometrics.

[44]  V. Wiseman,et al.  Burden of illness estimates for priority setting: a debate revisited. , 1998, Health policy.

[45]  Ramon Gisbert,et al.  Clinical InvestigationsBronchitisCosts of Chronic Bronchitis and COPDa: A 1-Year Follow-up Study , 2003 .

[46]  R. Holle,et al.  Uncertainty Assessment of Input Parameters for Economic Evaluation: Gauss’s Error Propagation, an Alternative to Established Methods , 2010, Medical decision making : an international journal of the Society for Medical Decision Making.

[47]  P. Coyte,et al.  The patient level cost of asthma in adults in south central Ontario. Pharmacy Medication Monitoring Program Advisory Board. , 1998, Canadian respiratory journal.

[48]  T. Hodgson,et al.  Cost-of-illness methodology: a guide to current practices and procedures. , 1982, The Milbank Memorial Fund quarterly. Health and society.

[49]  B. Browning,et al.  A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals. , 2009, American journal of human genetics.

[50]  Livio Garattini,et al.  Healthcare costs of COPD in Italian referral centres: a prospective study. , 2007, Respiratory medicine.

[51]  P. Williamson,et al.  Cost of illness of inflammatory bowel disease in the UK: a single centre retrospective study , 2004, Gut.

[52]  T. Murdoch,et al.  The inevitable application of big data to health care. , 2013, JAMA.

[53]  P. Reuter What drug policies cost: estimating government drug policy expenditures. , 2006, Addiction.