Binary response models with M-phase case-control data

In this study, a more general single-index regression model was presented to characterize the relationship between a dichotomous response and covariates of interest. With M-phase (M>=2) case-control data supplemented by information on a response and certain covariates, we propose a pseudo maximum likelihood estimation for the index coefficients. In the receiver operating characteristic curve analysis, an estimation for the accuracy measure is further provided and is borrowed to seek an optimal linear predictor. As for the hypothesis of model correctness, a pseudo least squares approach is employed as an aid to devising suitable testing procedures. Moreover, the general theoretical frameworks of these estimators are well developed. Finally, extensive simulations and two empirical applications are used to illustrate the applicability of our methodology.

[1]  D. Pollard Empirical Processes: Theory and Applications , 1990 .

[2]  Richard Bellman,et al.  Adaptive Control Processes: A Guided Tour , 1961, The Mathematical Gazette.

[3]  Haibo Zhou,et al.  A Semiparametric Empirical Likelihood Method for Data from an Outcome‐Dependent Sampling Scheme with a Continuous Outcome , 2002, Biometrics.

[4]  Michael R Kosorok,et al.  On semiparametric efficient inference for two-stage outcome-dependent sampling with a continuous outcome. , 2009, Biometrika.

[5]  A. Scott,et al.  Case–control analysis with a continuous outcome variable , 2009, Statistics in Medicine.

[6]  Alastair Scott,et al.  Efficient estimation in multi-phase case-control studies , 2010 .

[7]  Haibo Zhou,et al.  Outcome-Dependent Sampling: An Efficient Sampling and Inference Procedure for Studies With a Continuous Outcome , 2007, Epidemiology.

[8]  Norman E. Breslow,et al.  Maximum Likelihood Estimation of Logistic Regression Parameters under Two‐phase, Outcome‐dependent Sampling , 1997 .

[9]  J E White,et al.  A two stage design for the study of the relationship between a rare exposure and a rare disease. , 1982, American journal of epidemiology.

[10]  B. Langholz Case–Cohort Studies , 2005 .

[11]  A. Scott,et al.  Fitting regression models to case-control data by maximum likelihood , 1997 .

[12]  J. Lamperti ON CONVERGENCE OF STOCHASTIC PROCESSES , 1962 .

[13]  Nilanjan Chatterjee,et al.  Maximum likelihood inference on a mixed conditionally and marginally specified regression model for genetic epidemiologic studies with two-phase sampling , 2007 .

[14]  R. Sherman The Limiting Distribution of the Maximum Rank Correlation Estimator , 1993 .

[15]  Chris J. Wild,et al.  Fitting prospective regression models to case-control data , 1991 .

[16]  J Halpern,et al.  Multi-stage sampling in genetic epidemiology. , 1997, Statistics in medicine.

[17]  R. Spady,et al.  AN EFFICIENT SEMIPARAMETRIC ESTIMATOR FOR BINARY RESPONSE MODELS , 1993 .

[18]  Yingcun Xia,et al.  Model checking in regression via dimension reduction , 2009 .

[19]  R. Pyke,et al.  Logistic disease incidence models and case-control studies , 1979 .

[20]  J. Anderson Separate sample logistic discrimination , 1972 .

[21]  JMD Thompson,et al.  Risk factors for small‐for‐gestational‐age babies: The Auckland Birthweight Collaborative Study , 2001, Journal of paediatrics and child health.

[22]  H. Ichimura,et al.  SEMIPARAMETRIC LEAST SQUARES (SLS) AND WEIGHTED SLS ESTIMATION OF SINGLE-INDEX MODELS , 1993 .

[23]  D. Pollard,et al.  $U$-Processes: Rates of Convergence , 1987 .

[24]  Michal Kulich,et al.  Improving the Efficiency of Relative-Risk Estimation in Case-Cohort Studies , 2004 .

[25]  W. Newey,et al.  Large sample estimation and hypothesis testing , 1986 .