Testing calibration of risk models at extremes of disease risk.

Risk-prediction models need careful calibration to ensure they produce unbiased estimates of risk for subjects in the underlying population given their risk-factor profiles. As subjects with extreme high or low risk may be the most affected by knowledge of their risk estimates, checking the adequacy of risk models at the extremes of risk is very important for clinical applications. We propose a new approach to test model calibration targeted toward extremes of disease risk distribution where standard goodness-of-fit tests may lack power due to sparseness of data. We construct a test statistic based on model residuals summed over only those individuals who pass high and/or low risk thresholds and then maximize the test statistic over different risk thresholds. We derive an asymptotic distribution for the max-test statistic based on analytic derivation of the variance-covariance function of the underlying Gaussian process. The method is applied to a large case-control study of breast cancer to examine joint effects of common single nucleotide polymorphisms (SNPs) discovered through recent genome-wide association studies. The analysis clearly indicates a non-additive effect of the SNPs on the scale of absolute risk, but an excellent fit for the linear-logistic model even at the extremes of risks.

[1]  Peter Kraft,et al.  Prediction of breast cancer risk by genetic risk factors, overall and by hormone receptor status , 2012, Journal of Medical Genetics.

[2]  R. D'Agostino,et al.  Genotype score in addition to common risk factors for prediction of type 2 diabetes. , 2008, The New England journal of medicine.

[3]  A. Cecile J.W. Janssens,et al.  How can polygenic inheritance be used in population screening for common diseases? , 2013, Genetics in Medicine.

[4]  Frank Windmeijer The asymptotic distribution of the sum of weighted squared residuals in binary choice models , 1990 .

[5]  M. Graffar [Modern epidemiology]. , 1971, Bruxelles medical.

[6]  References , 1971 .

[7]  R. Pyke,et al.  Logistic disease incidence models and case-control studies , 1979 .

[8]  David W. Hosmer,et al.  Applied Logistic Regression , 1991 .

[9]  Nilanjan Chatterjee,et al.  Projecting the performance of risk prediction based on polygenic analyses of genome-wide association studies , 2013, Nature Genetics.

[10]  M. Thun,et al.  Performance of Common Genetic Variants in Breast-cancer Risk Models , 2022 .

[11]  C R Weinberg,et al.  Applicability of the simple independent action model to epidemiologic studies involving two factors and a dichotomous outcome. , 1986, American journal of epidemiology.

[12]  W. Thompson,et al.  Effect modification and the limits of biological inference from epidemiologic data. , 1991, Journal of clinical epidemiology.

[13]  Muin J Khoury,et al.  Discriminative accuracy of genomic profiling comparing multiplicative and additive risk models , 2011, European Journal of Human Genetics.

[14]  Peter Kraft,et al.  Analysis of case-control association studies with known risk variants , 2012, Bioinform..

[15]  A. Tsiatis A note on a goodness-of-fit test for the logistic regression model , 1980 .

[16]  D. Thomas,et al.  Biological models and statistical interactions: an example from multistage carcinogenesis. , 1981, International journal of epidemiology.