EM for regularized zero‐inflated regression models with applications to postoperative morbidity after cardiac surgery in children

This paper proposes a new statistical approach for predicting postoperative morbidity such as intensive care unit length of stay and number of complications after cardiac surgery in children. In a recent multi‐center study sponsored by the National Institutes of Health, 311 children undergoing cardiac surgery were enrolled. Morbidity data are count data in which the observations take only nonnegative integer values. Often, the number of zeros in the sample cannot be accommodated properly by a simple model, thus requiring a more complex model such as the zero‐inflated Poisson regression model. We are interested in identifying important risk factors for postoperative morbidity among many candidate predictors. There is only limited methodological work on variable selection for the zero‐inflated regression models. In this paper, we consider regularized zero‐inflated Poisson models through penalized likelihood function and develop a new expectation–maximization algorithm for numerical optimization. Simulation studies show that the proposed method has better performance than some competing methods. Using the proposed methods, we analyzed the postoperative morbidity, which improved the model fitting and identified important clinical and biomarker risk factors. Copyright © 2014 John Wiley & Sons, Ltd.

[1]  Catherine D Krawczeski,et al.  Postoperative biomarkers predict acute kidney injury and poor outcomes after pediatric cardiac surgery. , 2011, Journal of the American Society of Nephrology : JASN.

[2]  Runze Li,et al.  New variable selection methods for zero‐inflated count data with applications to the substance abuse field , 2011, Statistics in medicine.

[3]  C. Krawczeski,et al.  Incidence, risk factors, and outcomes of acute kidney injury after pediatric cardiac surgery: A prospective multicenter study* , 2011, Critical care medicine.

[4]  Jian Huang,et al.  COORDINATE DESCENT ALGORITHMS FOR NONCONVEX PENALIZED REGRESSION, WITH APPLICATIONS TO BIOLOGICAL FEATURE SELECTION. , 2011, The annals of applied statistics.

[5]  Jianqing Fan,et al.  Nonconcave Penalized Likelihood With NP-Dimensionality , 2009, IEEE Transactions on Information Theory.

[6]  A. Garg,et al.  Early postoperative serum cystatin C predicts severe acute kidney injury following pediatric cardiac surgery. , 2011, Kidney international.

[7]  A. Nierich,et al.  Prediction Models for Prolonged Intensive Care Unit Stay After Cardiac Surgery: Systematic Review and Validation Study , 2010, Circulation.

[8]  S. Geer,et al.  ℓ1-penalization for mixture regression models , 2010, 1202.6046.

[9]  Andrew A. Kramer,et al.  A predictive model for the early identification of patients at risk for a prolonged intensive care unit length of stay , 2010, BMC Medical Informatics Decis. Mak..

[10]  Cun-Hui Zhang Nearly unbiased variable selection under minimax concave penalty , 2010, 1002.4734.

[11]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[12]  Jianqing Fan,et al.  A Selective Overview of Variable Selection in High Dimensional Feature Space. , 2009, Statistica Sinica.

[13]  Willi Sauerbrei,et al.  Investigation about a screening step in model selection , 2008, Stat. Comput..

[14]  J. Muñoz-Pichardo,et al.  An application of mixture distributions in modelization of length of hospital stay , 2008, Statistics in medicine.

[15]  K. Lange,et al.  Coordinate descent algorithms for lasso penalized regression , 2008, 0803.3876.

[16]  D. Nelson,et al.  Factors prolonging length of stay in the cardiac intensive care unit following the arterial switch operation , 2008, Cardiology in the Young.

[17]  Jiahua Chen,et al.  Variable Selection in Finite Mixture of Regression Models , 2007 .

[18]  R. Tibshirani,et al.  PATHWISE COORDINATE OPTIMIZATION , 2007, 0708.1485.

[19]  M. Kenward,et al.  An Introduction to the Bootstrap , 2007 .

[20]  H. Kulkarni,et al.  A prospective cohort study of postoperative complications in the management of perforated peptic ulcer , 2006, BMC surgery.

[21]  Andy H. Lee,et al.  A robustified modeling approach to analyze pediatric length of stay. , 2005, Annals of epidemiology.

[22]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[23]  B. Clark,et al.  Abstract1039-206 Determinants of intensive care unit length of stay for infants undergoing cardiac surgery , 2004 .

[24]  Donald Hedeker,et al.  Modeling Clustered Count Data with Excess Zeros in Health Care Outcomes Research , 2002, Health Services and Outcomes Research Methodology.

[25]  Irene Lena Hudson,et al.  Finite Mixture, Zero-inflated Poisson and Hurdle models with application to SIDS , 2003, Comput. Stat. Data Anal..

[26]  A. Hoskote,et al.  Risk factors for long intensive care unit stay after cardiopulmonary bypass in children* , 2003, Critical care medicine.

[27]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[28]  S Gallivan,et al.  Early identification of divergent performance in congenital cardiac surgery. , 2001, European journal of cardio-thoracic surgery : official journal of the European Association for Cardio-thoracic Surgery.

[29]  J. Monro,et al.  Assessment of mortality rates for congenital heart defects and surgeons' performance. , 2001, The Annals of thoracic surgery.

[30]  L. Breiman Heuristics of instability and stabilization in model selection , 1996 .

[31]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[32]  Diane Lambert,et al.  Zero-inflacted Poisson regression, with an application to defects in manufacturing , 1992 .

[33]  Q. Vuong Likelihood Ratio Tests for Model Selection and Non-Nested Hypotheses , 1989 .

[34]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[35]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[36]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .