Analysing bivariate survival data with interval sampling and application to cancer epidemiology.

In biomedical studies, ordered bivariate survival data are frequently encountered when bivariate failure events are used as outcomes to identify the progression of a disease. In cancer studies, interest could be focused on bivariate failure times, for example, time from birth to cancer onset and time from cancer onset to death. This paper considers a sampling scheme, termed interval sampling, in which the first failure event is identified within a calendar time interval, the time of the initiating event can be retrospectively confirmed and the occurrence of the second failure event is observed subject to right censoring. In a cancer data application, the initiating, first and second events could correspond to birth, cancer onset and death. The fact that the data are collected conditional on the first failure event occurring within a time interval induces bias. Interval sampling is widely used for collection of disease registry data by governments and medical institutions, though the interval sampling bias is frequently overlooked by researchers. This paper develops statistical methods for analysing such data. Semiparametric methods are proposed under semi-stationarity and stationarity. Numerical studies demonstrate that the proposed estimation approaches perform well with moderate sample sizes. We apply the proposed methods to ovarian cancer registry data.

[1]  Yi Li,et al.  Mixture cure survival models with dependent censoring , 2007 .

[2]  C. Genest,et al.  A semiparametric estimation procedure of dependence parameters in multivariate families of distributions , 1995 .

[3]  Deborah Schrag,et al.  Overview of the SEER-Medicare Data: Content, Research Applications, and Generalizability to the United States Elderly Population , 2002, Medical care.

[4]  M. Kosorok Introduction to Empirical Processes and Semiparametric Inference , 2008 .

[5]  Pao-Sheng Shen,et al.  Nonparametric analysis of doubly truncated data , 2010 .

[6]  Y. Vardi Empirical Distributions in Selection Bias Models , 1985 .

[7]  Bradley Efron,et al.  Nonparametric Methods for Doubly Truncated Data , 1998 .

[8]  M. J. Frank On the simultaneous associativity of F(x, y) and x+y-F(x, y). (Short Communication). , 1978 .

[9]  Zhiliang Ying,et al.  Estimation and Model Selection of Semiparametric Multivariate Survival Functions under General Censorship , 2008, Journal of econometrics.

[10]  Douglas E Schaubel,et al.  Non‐parametric estimation of gap time survival functions for ordered multivariate failure time data , 2004, Statistics in medicine.

[11]  Zhiliang Ying,et al.  Nonparametric estimation of the gap time distributions for serial events with censored data , 1999 .

[12]  R. Serfling Approximation Theorems of Mathematical Statistics , 1980 .

[13]  P. Hougaard A class of multivanate failure time distributions , 1986 .

[14]  D. Clayton A model for association in bivariate life tables and its application in epidemiological studies of familial tendency in chronic disease incidence , 1978 .

[15]  Y. Vardi,et al.  Nonparametric Estimation in the Presence of Length Bias , 1982 .

[16]  Gail Gong,et al.  Pseudo Maximum Likelihood Estimation: Theory and Applications , 1981 .

[17]  Mei-Cheng Wang,et al.  A Semiparametric Model for Randomly Truncated Data , 1989 .

[18]  Michael Visser,et al.  Nonparametric estimation of the bivariate survival function with an application to vertically transmitted AIDS , 1996 .

[19]  T. Louis,et al.  Inferences on the association parameter in copula models for bivariate survival data. , 1995, Biometrics.

[20]  M. J. Frank On the simultaneous associativity ofF(x,y) andx +y -F(x,y) , 1979 .

[21]  Martin T. Wells,et al.  Nonparametric estimation of successive duration times under dependent censoring , 1998 .