Within-Cluster Resampling for Analysis of Family Data: Ready for Prime-Time?

Hoffman et al. [1] proposed an elegant resampling method for analyzing clustered binary data. The focus of their paper was to perform association tests on clustered binary data using within-cluster-resampling (WCR) method. Follmann et al. [2] extended Hoffman et al.'s procedure more generally with applicability to angular data, combining of p-values, testing of vectors of parameters, and Bayesian inference. Follmann et al. [2] termed their procedure multiple outputation because all "excess" data within each cluster is thrown out multiple times. Herein, we refer to this procedure as WCR-MO. For any statistical test to be useful for a particular design, it must be robust, have adequate power, and be easy to implement and flexible. WCR-MO can be easily extended to continuous data and is a computationally intensive but simple and highly flexible method. Considering family as a cluster, one can apply WCR to familial data in genetic studies. Using simulations, we evaluated WCR-MO's robustness for analysis of a continuous trait in terms of type I error rates in genetic research. WCR-MO performed well at the 5% α-level. However, it provided inflated type I error rates for α-levels less than 5% implying the procedure is liberal and may not be ready for application to genetic studies where α levels used are typically much less than 0.05.

[1]  C. R. Weinberg,et al.  Analysis of Clustered Binary Outcomes Using Within‐Cluster Paired Resampling , 2002, Biometrics.

[2]  L. Luzzatto,et al.  Transcriptional Functionality of Germ Line p53 Mutants Influences Cancer Phenotype , 2007, Clinical Cancer Research.

[3]  J. Hanley,et al.  Statistical analysis of correlated data using generalized estimating equations: an orientation. , 2003, American journal of epidemiology.

[4]  Katherine S Panageas,et al.  Properties of analysis methods that account for clustering in volume–outcome studies when the primary predictor is cluster size , 2007, Statistics in medicine.

[5]  Peter B Gilbert,et al.  Two‐Sample Tests for Comparing Intra‐Individual Genetic Sequence Diversity between Populations , 2005, Biometrics.

[6]  E. Leifer,et al.  Multiple Outputation: Inference for Complex Clustered Data by Averaging Analyses from Independent Data , 2003, Biometrics.

[7]  David G Addiss,et al.  Modeling survival data with informative cluster size , 2008, Statistics in medicine.

[8]  Pranab Kumar Sen,et al.  Within‐cluster resampling , 2001 .

[9]  Somnath Datta,et al.  Rank-Sum Tests for Clustered Data , 2005 .

[10]  M. Corey,et al.  Confidence intervals for candidate gene effects and environmental factors in population‐based association studies of families , 2007, Annals of human genetics.

[11]  Marina Vannucci,et al.  Wavelet-Based Nonparametric Modeling of Hierarchical Functions in Colon Carcinogenesis , 2003 .

[12]  C. Faes,et al.  Estimating herd‐specific force of infection by using random‐effects models for clustered binary data and monotone fractional polynomials , 2006 .

[13]  L. Hedges,et al.  Statistical Methods for Meta-Analysis , 1987 .

[14]  M. Walsh,et al.  Inhaled nitric oxide in preterm infants undergoing mechanical ventilation. , 2006, The New England journal of medicine.

[15]  Hannu Oja,et al.  A weighted multivariate sign test for cluster-correlated data , 2005 .

[16]  E. Kistner,et al.  A method for identifying genes related to a quantitative trait, incorporating multiple siblings and missing parents , 2005, Genetic epidemiology.

[17]  J. N. K. Rao,et al.  Mean estimating equation approach to analysing cluster-correlated data with nonignorable cluster sizes , 2005 .

[18]  R. Dewar,et al.  Comparison of the Abbott 7000 and Bayer 340 Systems for Measurement of Hepatitis C Virus Load , 2007, Journal of Clinical Microbiology.

[19]  Somnath Datta,et al.  Marginal Analyses of Clustered Data When Cluster Size Is Informative , 2003, Biometrics.

[20]  D. Allison,et al.  Towards sound epistemological foundations of statistical methods for high-dimensional biology , 2004, Nature Genetics.

[21]  E. Agrón,et al.  Genotype-phenotype correlation in von Hippel-Lindau disease with retinal angiomatosis. , 2007, Archives of ophthalmology.

[22]  Wenbin Lu Marginal Regression of Multivariate Event Times Based on Linear Transformation Models , 2005, Lifetime data analysis.

[23]  D. Finkelstein,et al.  Multivariate logistic regression for familial aggregation in age at disease onset , 2007, Lifetime data analysis.

[24]  D. Allison,et al.  The Effect of Assortative Mating upon Genetic Association Studies: Spurious Associations and Population Substructure in the Absence of Admixture , 2006, Behavior genetics.

[25]  S. G. Meester,et al.  A parametric model for cluster correlated categorical data. , 1994, Biometrics.