A flexible model for the mean and variance functions, with application to medical cost data

Medical cost data are often skewed to the right and heteroscedastic, having a nonlinear relation with covariates. To tackle these issues, we consider an extension to generalized linear models by assuming nonlinear associations of covariates in the mean function and allowing the variance to be an unknown but smooth function of the mean. We make no further assumption on the distributional form. The unknown functions are described by penalized splines, and the estimation is carried out using nonparametric quasi-likelihood. Simulation studies show the flexibility and advantages of our approach. We apply the model to the annual medical costs of heart failure patients in the clinical data repository at the University of Virginia Hospital System.

[1]  Angela Mariotto,et al.  Comparison of Approaches for Estimating Incidence Costs of Care for Colorectal Cancer Patients , 2009, Medical care.

[2]  Elliott S Fisher,et al.  Analysis of observational studies in the presence of treatment selection bias: effects of invasive cardiac management on AMI survival using propensity score and instrumental variable methods. , 2007, JAMA.

[3]  Lei Liu,et al.  Joint modeling longitudinal semi‐continuous data and survival, with application to longitudinal medical cost data , 2009, Statistics in medicine.

[4]  D. Mozaffarian,et al.  Heart disease and stroke statistics--2010 update: a report from the American Heart Association. , 2010, Circulation.

[5]  Huazhen Lin,et al.  Non‐parametric heteroscedastic transformation regression models for skewed data with an application to health care costs , 2008 .

[6]  R. Rigby,et al.  Generalized additive models for location, scale and shape , 2005 .

[7]  W. Manning,et al.  The logged dependent variable, heteroscedasticity, and the retransformation problem. , 1998, Journal of health economics.

[8]  UsingSmoothing SplinesbyXihong Liny,et al.  Inference in Generalized Additive Mixed Models , 1999 .

[9]  D. Ruppert Selecting the Number of Knots for Penalized Splines , 2002 .

[10]  Paul J Rathouz,et al.  Two-stage residual inclusion estimation: addressing endogeneity in health econometric modeling. , 2008, Journal of health economics.

[11]  William A. Knaus,et al.  A random effects four-part model, with application to correlated medical costs , 2008, Comput. Stat. Data Anal..

[12]  Zhiliang Ying,et al.  Semiparametric and Nonparametric Regression Analysis of Longitudinal Data , 2001 .

[13]  N. Duan Smearing Estimate: A Nonparametric Retransformation Method , 1983 .

[14]  D. Lin,et al.  Proportional Means Regression for Censored Medical Costs , 2000, Biometrics.

[15]  M. Wand,et al.  Semiparametric Regression: Parametric Regression , 2003 .

[16]  R. Park Estimation with Heteroscedastic Error Terms , 1966 .

[17]  Chenlei Leng,et al.  Semiparametric Mean–Covariance Regression Analysis for Longitudinal Data , 2009 .

[18]  Robert Kohn,et al.  Estimation and variable selection in nonparametric heteroscedastic regression , 2003, Stat. Comput..

[19]  Lei Liu,et al.  Analysis of Longitudinal Data in the Presence of Informative Observational Times and a Dependent Terminal Event, with Application to Medical Cost Data , 2008, Biometrics.

[20]  P. McCullagh,et al.  Generalized Linear Models , 1984 .

[21]  Lei Liu,et al.  A shared random effects model for censored medical costs and mortality , 2007, Statistics in medicine.

[22]  B. Silverman,et al.  Nonparametric regression and generalized linear models , 1994 .

[23]  R. W. Wedderburn Quasi-likelihood functions, generalized linear models, and the Gauss-Newton method , 1974 .

[24]  Therese A. Stukel,et al.  Long-term Outcomes of Regional Variations in Intensity of Invasive vs Medical Management of Medicare Patients With Acute Myocardial Infarction , 2005 .

[25]  S. Wood Modelling and smoothing parameter estimation with multiple quadratic penalties , 2000 .

[26]  Zhehui Luo,et al.  A dynamic model for estimating changes in health status and costs , 2006, Statistics in medicine.

[27]  Mark A Hlatky,et al.  Differences in Medical Care and Disease Outcomes Among Black and White Women With Heart Disease , 2003, Circulation.

[28]  Anirban Basu,et al.  Generalized Modeling Approaches to Risk Adjustment of Skewed Outcomes Data , 2003, Journal of health economics.

[29]  A. Basu,et al.  Estimating marginal and incremental effects on health outcomes using flexible link and variance function models. , 2005, Biostatistics.

[30]  W. Manning,et al.  Estimating Log Models: To Transform or Not to Transform? , 1999, Journal of health economics.

[31]  Anastasios A. Tsiatis,et al.  Estimating medical costs with censored data , 2000 .

[32]  D. Ruppert,et al.  Penalized Spline Estimation for Partially Linear Single-Index Models , 2002 .

[33]  David J. Nott,et al.  Semiparametric estimation of mean and variance functions for non-Gaussian data , 2006, Comput. Stat..

[34]  C A Gatsonis,et al.  Variations in the Utilization of Coronary Angiography for Elderly Patients with an Acute Myocardial Infarction: An Analysis Using Hierarchical Logistic Regression , 1995, Medical care.

[35]  Siu Hui,et al.  Methods for Comparison of Cost Data , 1997, Annals of Internal Medicine.

[36]  M C Hornbrook,et al.  Modeling risk using generalized linear models. , 1999, Journal of health economics.

[37]  Jeng-Min Chiou,et al.  Nonparametric quasi-likelihood , 1999 .

[38]  Hongwei Zhao,et al.  On the equivalence of some medical cost estimators with censored data. , 2007, Statistics in medicine.

[39]  E. Feuer,et al.  Estimating medical costs from incomplete follow-up data. , 1997, Biometrics.

[40]  B. Obama,et al.  Office of the Press Secretary , 2009 .

[41]  Anastasios A Tsiatis,et al.  Median Regression with Censored Cost Data , 2002, Biometrics.

[42]  D. Lin,et al.  Regression analysis of incomplete medical cost data , 2003, Statistics in medicine.

[43]  Irène Gijbels,et al.  Nonparametric estimation of mean and dispersion functions in extended generalized linear models , 2008 .

[44]  D. Mozaffarian,et al.  Heart disease and stroke statistics--2009 update: a report from the American Heart Association Statistics Committee and Stroke Statistics Subcommittee. , 2009, Circulation.

[45]  J. Robins,et al.  Recovery of Information and Adjustment for Dependent Censoring Using Surrogate Markers , 1992 .