SEMIPARAMETRIC ZERO-INFLATED MODELING IN MULTI-ETHNIC STUDY OF ATHEROSCLEROSIS (MESA).

We analyze the Agatston score of coronary artery calcium (CAC) from the Multi-Ethnic Study of Atherosclerosis (MESA) using semi-parametric zero-inflated modeling approach, where the observed CAC scores from this cohort consist of high frequency of zeroes and continuously distributed positive values. Both partially constrained and unconstrained models are considered to investigate the underlying biological processes of CAC development from zero to positive, and from small amount to large amount. Different from existing studies, a model selection procedure based on likelihood cross-validation is adopted to identify the optimal model, which is justified by comparative Monte Carlo studies. A shrinkaged version of cubic regression spline is used for model estimation and variable selection simultaneously. When applying the proposed methods to the MESA data analysis, we show that the two biological mechanisms influencing the initiation of CAC and the magnitude of CAC when it is positive are better characterized by an unconstrained zero-inflated normal model. Our results are significantly different from those in published studies, and may provide further insights into the biological mechanisms underlying CAC development in human. This highly flexible statistical framework can be applied to zero-inflated data analyses in other areas.

[1]  R. Dennis Cook,et al.  Cross-Validation of Regression Models , 1984 .

[2]  Maria-Pia Victoria-Feser,et al.  Zero-Inflated Truncated Generalized Pareto Distribution for the Analysis of Radio Audience Data , 2010, 1101.1163.

[3]  S. Wood Thin plate regression splines , 2003 .

[4]  R. Simon,et al.  Flexible regression models with cubic splines. , 1989, Statistics in medicine.

[5]  Diane Lambert,et al.  Zero-inflacted Poisson regression, with an application to defects in manufacturing , 1992 .

[6]  S L Hui,et al.  Validation techniques for logistic regression models. , 1991, Statistics in medicine.

[7]  J. Horowitz,et al.  VARIABLE SELECTION IN NONPARAMETRIC ADDITIVE MODELS. , 2010, Annals of statistics.

[8]  Daniel S Berman,et al.  Determinants of coronary calcium conversion among patients with a normal coronary calcium scan: what is the "warranty period" for remaining normal? , 2010, Journal of the American College of Cardiology.

[9]  P S Albert,et al.  A generalized estimating equation approach for modeling random length binary vector data. , 1997, Biometrics.

[10]  J. Heckman Sample selection bias as a specification error , 1979 .

[11]  W. Greene Sample Selection Bias as a Specification Error: Comment , 1981 .

[12]  J. Shao Linear Model Selection by Cross-validation , 1993 .

[13]  Richard A. Kronmal,et al.  Distribution of Coronary Artery Calcium by Race, Gender, and Age: Results from the Multi-Ethnic Study of Atherosclerosis (MESA) , 2005, Circulation.

[14]  Xiao-Hua Zhou,et al.  Estimating the retransformed mean in a heteroscedastic two-part model , 2006 .

[15]  Alan Agresti,et al.  Random effect models for repeated measures of zero-inflated count data , 2005 .

[16]  P. Greenland,et al.  Coronary artery calcium score and risk classification for coronary heart disease prediction. , 2010, JAMA.

[17]  Kung-Sik Chan,et al.  Generalized Additive Models for Zero‐Inflated Data with Partial Constraints , 2011 .

[18]  Frank C Curriero,et al.  Mixture models for quantitative HIV RNA data , 2002, Statistical methods in medical research.

[19]  S. Wood Generalized Additive Models: An Introduction with R , 2006 .

[20]  M. Wand,et al.  Semiparametric Regression: Parametric Regression , 2003 .

[21]  Ker-Chau Li,et al.  Regression Analysis Under Link Violation , 1989 .

[22]  Hongqi Xue,et al.  Semiparametric Analysis of Zero‐Inflated Count Data , 2006, Biometrics.

[23]  Alan E. Gelfand,et al.  Zero-inflated models with application to spatial count data , 2002, Environmental and Ecological Statistics.

[24]  Sylvain Arlot,et al.  A survey of cross-validation procedures for model selection , 2009, 0907.4728.

[25]  Richard A. Kronmal,et al.  Two-Part Models for Analysis of Agatston Scores with Possible Proportionality Constraints , 2006 .

[26]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[27]  Kung-Sik Chan,et al.  Nonparametric Threshold Model of Zero-Inflated Spatio-Temporal Data with Application to Shifts in Jellyfish Distribution , 2011 .

[28]  R. Kronmal,et al.  Multi-Ethnic Study of Atherosclerosis: objectives and design. , 2002, American journal of epidemiology.

[29]  Padhraic Smyth,et al.  Model selection for probabilistic clustering using cross-validated likelihood , 2000, Stat. Comput..

[30]  R. Detrano,et al.  Quantification of coronary artery calcium using ultrafast computed tomography. , 1990, Journal of the American College of Cardiology.

[31]  M C Hornbrook,et al.  Modeling risk using generalized linear models. , 1999, Journal of health economics.

[32]  T. Amemiya Tobit models: A survey , 1984 .

[33]  Shuangge Ma,et al.  Determination of proportionality in two-part models and analysis of Multi-Ethnic Study of Atherosclerosis (MESA). , 2011, Statistics and its interface.

[34]  J. Mullahy Specification and testing of some modified count data models , 1986 .

[35]  R. Kronmal,et al.  Statistical Modeling of Agatston Score in Multi-Ethnic Study of Atherosclerosis (MESA) , 2010, PloS one.