Modern sample size determination for unordered categorical data

Sample size determination is one of the most important practical tasks for statisticians. In this paper, we study sample size determination for unordered categorical data, with or without a pilot sample. With a pilot sample, we provide a minimal difference method, a first order correction, and bootstrap methods for sample size determination in the comparison of two multinomial distributions using the usual chi-squared test. We also propose a Bayesian approach that uses an extension of a posterior predictive p-value. The performance of these methods is investigated via both a simulation study and a real application to leukoplakia lesion data. We advocate a better performance measure than MSE when the sampling distribution is highly skewed. Practical recommendations are given. Some asymptotic results are also provided.

[1]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[2]  Power and sample size for ordered categorical data , 2003, Statistical methods in medical research.

[3]  D. G. Chapman,et al.  The Power of Chi Square Tests for Contingency Tables , 1966 .

[4]  Raphael Gillett,et al.  Sample Size Determination in a Chi-Squared Test Given Information From an Earlier Study , 1996 .

[5]  Xiao-Li Meng,et al.  Posterior Predictive $p$-Values , 1994 .

[6]  Goodness-of-fit of conditional regression models for multiple imputation , 2011 .

[7]  C. Adcock Sample size determination : a review , 1997 .

[8]  N. Hjort,et al.  Post-Processing Posterior Predictive p Values , 2006 .

[9]  Steven K. Thompson,et al.  Sample Size for Estimating Multinomial Proportions , 1987 .

[10]  Fei Wang,et al.  A simulation-based approach to Bayesian sample size determination for performance under a given model and for separating models , 2002 .

[11]  C. J. Adcock,et al.  An Improved Bayesian Procedure for Calculating Sample Sizes in Multinomial Sampling , 1993 .

[12]  Lu Lu,et al.  Comment: Bayesian Checking of the Second Level of Hierarchical Models: Cross-Validated Posterior Predictive Checks Using Discrepancy Measures , 2007 .

[13]  J. Rochon The application of the GSK method to the determination of minimum sample sizes. , 1989, Biometrics.

[14]  Joseph Glaz,et al.  Simultaneous Confidence Intervals and Sample Size Determination for Multinomial Proportions , 1995 .

[15]  J. Pindborg,et al.  Studies in oral leukoplakias. Prevalence of leukoplakia among 10,000 persons in Lucknow, India, with special reference to use of tobacco and betel nut. , 1967, Bulletin of the World Health Organization.

[16]  B. Everitt,et al.  Statistical methods for rates and proportions , 1973 .

[17]  Xiao-Li Meng,et al.  POSTERIOR PREDICTIVE ASSESSMENT OF MODEL FITNESS VIA REALIZED DISCREPANCIES , 1996 .

[18]  S Greenland,et al.  Tests for interaction in epidemiologic studies: a review and a study of power. , 1983, Statistics in medicine.

[19]  R. Macedonia Altered and Conventional Fractionated Radiotherapy in Locoregional Control and Survival of Patients with Squamous Cell Carcinoma of the Larynx, Oropharynx, and Hypopharynx , 2006 .

[20]  Geir Storvik,et al.  Posterior Predictive p‐values in Bayesian Hierarchical Models , 2009 .

[21]  J. Whitehead Sample size calculations for ordered categorical data. , 1993, Statistics in medicine.

[22]  Robert D. Tortora,et al.  A Note on Sample Size Estimation for Multinomial Populations , 1978 .

[23]  C. Schiffer,et al.  A comparative study of two different doses of cytarabine for acute myeloid leukemia: a phase III trial of Cancer and Leukemia Group B. , 1991, Blood.

[24]  T. Day,et al.  Oral Cancer and Precancerous Lesions , 2002, CA: a cancer journal for clinicians.

[25]  Sample sizes for the exact test of 'no interaction' in 2 X 2 X 2 tables. , 1983, Biometrics.