Bayesian Analysis of Cancer Rates From SEER Program Using Parametric and Semiparametric Joinpoint Regression Models

Cancer is the second leading cause of death in the United States. Cancer incidence and mortality rates measure the progress against cancer; these rates are obtained from the Surveillance, Epidemiology, and End Results (SEER) Program of the National Cancer Institute (NCI). Lung cancer has the highest mortality rate among all cancers, whereas prostate cancer has the highest number of new cases among males. In this article, we analyze the incidence rates of these two cancers, as well as colon and rectal cancer. The NCI reports trends in cancer age-adjusted mortality and incidence rates in its annual report to the nation and analyzes them using the Joinpoint software. The location of the joinpoints signifies changes in cancer trends, whereas changes in the regression slope measure the degree of change. The Joinpoint software uses a numerical search to detect the joinpoints, fits regression within two consecutive joinpoints by least squares, and finally selects the number of joinpoints by either a series of permutation tests or the Bayesian information criterion. We propose Bayesian joinpoint models and provide statistical estimates of the joinpoints and the regression slopes. While the Joinpoint software and other work in this area assumes that the joinpoints occur on the discrete time grid, we allow a continuous prior for the joinpoints induced by the Dirichlet distribution on the spacings in between. This prior further allows the user to impose prespecified minimum gaps in between two consecutive joinpoints. We develop parametric as well as semiparametric Bayesian joinpoint models; the semiparametric framework relaxes parametric distributional assumptions by modeling the distribution of regression slopes and error variances using Dirichlet process mixtures. These Bayesian models provide statistical inference with finite sample validity. Through a simulation study, we demonstrate the performance of the proposed parametric and semiparametric joinpoint models and compare the results with the ones from the Joinpoint software. We analyze age-adjusted cancer incidence rates from the SEER Program using these Bayesian models with different numbers of joinpoints by employing the deviance information criterion and the cross-validated predictive criterion. In addition, we model the lung cancer incidence rates and the smoking rates jointly and explore the relation between the two longitudinal processes.

[1]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[2]  J. Sethuraman,et al.  Convergence of Dirichlet Measures and the Interpretation of Their Parameter. , 1981 .

[3]  T. Ferguson A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[4]  H M Rosenberg,et al.  Annual report to the nation on the status of cancer (1973 through 1998), featuring cancers with recent increasing trends. , 2001, Journal of the National Cancer Institute.

[5]  D. Blackwell,et al.  Ferguson Distributions Via Polya Urn Schemes , 1973 .

[6]  David J. Spiegelhalter,et al.  WinBUGS user manual version 1.4 , 2003 .

[7]  M. Escobar,et al.  Bayesian Density Estimation and Inference Using Mixtures , 1995 .

[8]  Chronic Disease Division Cancer facts and figures , 2010 .

[9]  Aki Vehtari,et al.  Discussion on the paper by Spiegelhalter, Best, Carlin and van der Linde , 2002 .

[10]  B. Edwards,et al.  A vision for cancer incidence surveillance in the United States , 2003, Cancer Causes & Control.

[11]  Ahmedin Jemal,et al.  Annual Report to the Nation on the status of cancer, 1973–1999, featuring implications of age and aging on U.S. cancer burden , 2002, Cancer.

[12]  D J Spiegelhalter,et al.  Flexible random‐effects models using Bayesian semi‐parametric models: applications to institutional comparisons , 2007, Statistics in medicine.

[13]  S. Geisser,et al.  A Predictive Approach to Model Selection , 1979 .

[14]  A. Gelman Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper) , 2004 .

[15]  J. Sethuraman A CONSTRUCTIVE DEFINITION OF DIRICHLET PRIORS , 1991 .

[16]  E. Feuer,et al.  Permutation tests for joinpoint regression with applications to cancer rates. , 2000, Statistics in medicine.

[17]  Hong Chang,et al.  Model Determination Using Predictive Distributions with Implementation via Sampling-Based Methods , 1992 .

[18]  Adrian F. M. Smith,et al.  Automatic Bayesian curve fitting , 1998 .

[19]  C. Robert,et al.  Deviance information criteria for missing data models , 2006 .

[20]  S. Chib,et al.  Marginal Likelihood and Bayes Factors for Dirichlet Process Mixture Models , 2003 .

[21]  L. Wasserman,et al.  Computing Bayes Factors by Combining Simulation and Asymptotic Approximations , 1997 .

[22]  Aki Vehtari Discussion to "Bayesian measures of model complexity and fit" by Spiegelhalter, D.J., Best, N.G., Carlin, B.P., and van der Linde, A. , 2002 .

[23]  S. MacEachern Estimating normal means with a conjugate style dirichlet process prior , 1994 .

[24]  David Siegmund,et al.  Confidence Sets in Change-point Problems , 1988 .

[25]  J. Skilling,et al.  Bayesian Density Estimation , 1996 .

[26]  L Rukhin Andrew Change-Point Analysis as a Multiple Decision Problem , 1996 .

[27]  N. G. Best,et al.  WinBUGS User Manual: Version 1.4 , 2001 .

[28]  S. MacEachern,et al.  Estimating mixture of dirichlet process models , 1998 .

[29]  E. Feuer,et al.  SEER Cancer Statistics Review, 1975-2003 , 2006 .

[30]  Bradley P. Carlin,et al.  Bayesian measures of model complexity and fit , 2002 .

[31]  W. Michael Conklin,et al.  Monte Carlo Methods in Bayesian Computation , 2001, Technometrics.

[32]  P. Lerman Fitting Segmented Regression Models by Grid Search , 1980 .

[33]  C. Cardinez,et al.  United States cancer statistics; 2003 incidence and mortality , 2006 .

[34]  P. Feder The Log Likelihood Ratio in Segmented Regression , 1975 .

[35]  Adrian F. M. Smith,et al.  Sampling-Based Approaches to Calculating Marginal Densities , 1990 .

[36]  Andrew Gelman,et al.  General methods for monitoring convergence of iterative simulations , 1998 .

[37]  Siddhartha Chib,et al.  Bayesian model selection for join point regression with application to age‐adjusted cancer rates , 2005 .

[38]  H. Ishwaran,et al.  DIRICHLET PRIOR SIEVES IN FINITE NORMAL MIXTURES , 2002 .

[39]  P. Gustafson,et al.  Conservative prior distributions for variance parameters in hierarchical models , 2006 .

[40]  S. Mukhopadhyay,et al.  BAYESIAN ANALYSIS OF BINARY REGRESSION USING SYMMETRIC AND ASYMMETRIC LINKS , 2000 .

[41]  Yuedong Wang Smoothing Spline Models with Correlated Random Errors , 1998 .

[42]  Binbing Yu,et al.  Comparability of Segmented Line Regression Models , 2004, Biometrics.

[43]  Yan Wang,et al.  Jointly Modeling Longitudinal and Event Time Data With Application to Acquired Immunodeficiency Syndrome , 2001 .

[44]  D. Brillinger,et al.  The natural variability of vital rates and associated statistics. , 1986, Biometrics.

[45]  Lancelot F. James,et al.  Gibbs Sampling Methods for Stick-Breaking Priors , 2001 .

[46]  P. Green Reversible jump Markov chain Monte Carlo computation and Bayesian model determination , 1995 .

[47]  R. Kass,et al.  Statistical smoothing of neuronal data. , 2003, Network.

[48]  Lan Huang,et al.  Semiparametric Bayesian approaches to joinpoint regression for population-based cancer survival data , 2009, Comput. Stat. Data Anal..

[49]  M. Escobar Estimating Normal Means with a Dirichlet Process Prior , 1994 .

[50]  Lancelot F. James,et al.  Approximate Dirichlet Process Computing in Finite Normal Mixtures , 2002 .

[51]  S. Zeger,et al.  Joint analysis of longitudinal data comprising repeated measures and times to events , 2001 .

[52]  Adrian F. M. Smith,et al.  Hierarchical Bayesian Analysis of Changepoint Problems , 1992 .