A statistical method for studying correlated rare events and their risk factors

Longitudinal studies of rare events such as cervical high-grade lesions or colorectal polyps that can recur often involve correlated binary data. Risk factor for these events cannot be reliably examined using conventional statistical methods. For example, logistic regression models that incorporate generalized estimating equations often fail to converge or provide inaccurate results when analyzing data of this type. Although exact methods have been reported, they are complex and computationally difficult. The current paper proposes a mathematically straightforward and easy-to-use two-step approach involving (i) an additive model to measure associations between a rare or uncommon correlated binary event and potential risk factors and (ii) a permutation test to estimate the statistical significance of these associations. Simulation studies showed that the proposed method reliably tests and accurately estimates the associations of exposure with correlated binary rare events. This method was then applied to a longitudinal study of human leukocyte antigen (HLA) genotype and risk of cervical high grade squamous intraepithelial lesions (HSIL) among HIV-infected and HIV-uninfected women. Results showed statistically significant associations of two HLA alleles among HIV-negative but not HIV-positive women, suggesting that immune status may modify the HLA and cervical HSIL association. Overall, the proposed method avoids model nonconvergence problems and provides a computationally simple, accurate, and powerful approach for the analysis of risk factor associations with rare/uncommon correlated binary events.

[1]  P. McCullagh,et al.  Bias Correction in Generalized Linear Models , 1991 .

[2]  A. Albert,et al.  On the existence of maximum likelihood estimates in logistic regression models , 1984 .

[3]  K. Anastos,et al.  Human papillomavirus type 16 and immune status in human immunodeficiency virus-seropositive women. , 2003, Journal of the National Cancer Institute.

[4]  G. Molenberghs,et al.  An exponential family model for clustered multivariate binary data , 1999 .

[5]  J. Anderson,et al.  Penalized maximum likelihood estimation in logistic regression and discrimination , 1982 .

[6]  Karl-Heinz Jöckel,et al.  Bootstrapping and Related Techniques , 1992 .

[7]  M. G. Pittau,et al.  A weakly informative default prior distribution for logistic and other regression models , 2008, 0901.4011.

[8]  M. Aerts,et al.  A solution to separation for clustered binary data , 2012 .

[9]  Douglas M Potter,et al.  A permutation test for inference in logistic regression with small‐ and moderate‐sized data sets , 2005, Statistics in medicine.

[10]  A Agresti,et al.  Exact inference for categorical data: recent advances and continuing controversies , 2001, Statistics in medicine.

[11]  M. Schemper,et al.  A solution to the problem of separation in logistic regression , 2002, Statistics in medicine.

[12]  K. Anastos,et al.  Marginal and Mixed-Effects Models in the Analysis of Human Papillomavirus Natural History Data , 2010, Cancer Epidemiology, Biomarkers & Prevention.

[13]  Cyrus R. Mehta,et al.  Exact Stratified Linear Rank Tests for Ordered Categorical and Binary Data , 1992 .

[14]  Seuck Heun Song,et al.  A new random permutation test in ANOVA models , 2007 .

[15]  L. Kalish,et al.  The Women's Interagency HIV Study. WIHS Collaborative Study Group. , 1998, Epidemiology.

[16]  E. Korn,et al.  Testing logistic regression coefficients with clustered data and few positive outcomes , 2008, Statistics in medicine.

[17]  W W Hauck,et al.  Jackknife bias reduction for polychotomous logistic regression. , 1997, Statistics in medicine.

[18]  Man-Lai Tang,et al.  Statistical inference for correlated data in ophthalmologic studies , 2006, Statistics in medicine.

[19]  K. Anastos,et al.  Natural history and possible reactivation of human papillomavirus in human immunodeficiency virus-positive women. , 2005, Journal of the National Cancer Institute.

[20]  Joseph Feldman,et al.  The Women's Interagency HIV Study , 1998 .

[21]  H. O. Lancaster,et al.  Significance Tests in Discrete Distributions , 1961 .

[22]  Phillip I. Good,et al.  Extensions Of The Concept Of Exchangeability And Their Applications , 2002 .

[23]  D. Firth Bias reduction of maximum likelihood estimates , 1993 .

[24]  K F Hirji,et al.  A quasi-exact test for comparing two binomial proportions. , 1991, Statistics in medicine.

[25]  C. Braak,et al.  Permutation Versus Bootstrap Significance Tests in Multiple Regression and Anova , 1992 .

[26]  K. Hirji,et al.  A note on interrater agreement. , 1990, Statistics in medicine.

[27]  S. Zeger,et al.  Longitudinal data analysis using generalized linear models , 1986 .

[28]  Nitin R. Patel,et al.  An Exact Trend Test for Correlated Binary Data , 2001, Biometrics.

[29]  You-Gan Wang,et al.  Bias Reduction using Stochastic Approximation , 1998 .

[30]  Georg Heinze,et al.  A permutation test for inference in logistic regression with small‐ and moderate‐sized data sets by D. M. Potter, Statistics in Medicine 2005; 24:693–708 , 2006, Statistics in medicine.

[31]  R. Schaefer Bias correction in maximum likelihood logistic regression. , 1985, Statistics in medicine.

[32]  G. Barnard Must clinical trials be large? The interpretation of P-values and the combination of test results. , 1990, Statistics in medicine.

[33]  D. Firth Generalized Linear Models and Jeffreys Priors: An Iterative Weighted Least-Squares Approach , 1992 .

[34]  G Molenberghs,et al.  Statistical Methods for Developmental Toxicity: Analysis of Clustered Multivariate Binary Data , 1999, Annals of the New York Academy of Sciences.

[35]  A. P. White,et al.  The approximate randomization test as an alternative to the F test in analysis of variance , 1981 .

[36]  J. Goedert,et al.  The relation of HLA genotype to hepatitis C viral load and markers of liver fibrosis in HIV-infected and HIV-uninfected women. , 2011, The Journal of infectious diseases.

[37]  K. Liang,et al.  Asymptotic Properties of Maximum Likelihood Estimators and Likelihood Ratio Tests under Nonstandard Conditions , 1987 .

[38]  Francisco Cribari-Neto,et al.  On bias reduction in exponential and non-exponential family regression models , 1998 .