Methods for analysis of skewed data distributions in psychiatric clinical studies: working with many zero values.

OBJECTIVE Psychiatric clinical studies, including those in drug abuse research, often provide data that are challenging to analyze and use for hypothesis testing because they are heavily skewed and marked by an abundance of zero values. The authors consider methods of analyzing data with those characteristics. METHOD The possible meaning of zero values and the statistical methods that are appropriate for analyzing data with many zero values in both cross-sectional and longitudinal designs are reviewed. The authors illustrate the application of these alternative methods using sample data collected with the Addiction Severity Index. RESULTS Data that include many zeros, if the zero value is considered the lowest value on a scale that measures severity, may be analyzed with several methods other than standard parametric tests. If zero values are considered an indication of a case without a problem, for which a measure of severity is not meaningful, analyses should include separate statistical models for the zero values and for the nonzero values. Tests linking the separate models are available. CONCLUSIONS Standard methods, such as t tests and analyses of variance, may be poor choices for data that have unique features. The use of proper statistical methods leads to more meaningful study results and conclusions.

[1]  A T McLellan,et al.  An Improved Diagnostic Evaluation Instrument for Substance Abuse Patients: The Addiction Severity Index , 1980, The Journal of nervous and mental disease.

[2]  P. O'Brien Procedures for comparing samples with multiple endpoints. , 1984, Biometrics.

[3]  John M. Lachin,et al.  Two-Sample Asymptotically Distribution-Free Tests for Incomplete Multivariate Observations , 1984 .

[4]  Lee-Jen Wei,et al.  Combining dependent tests with incomplete repeated measurements , 1985 .

[5]  L. A. Marascuilo,et al.  Statistical Methods for the Social and Behavioral Sciences. , 1989 .

[6]  Scott L. Zeger,et al.  Analyzing repeated measures on generalized linear models via the bootstrap , 1989 .

[7]  Leonard A. Marascuilo,et al.  Statistical methods for the social and behavioral sciences , 1990 .

[8]  C. S. Davis Semi-parametric and non-parametric methods for the analysis of repeated measurements with applications to clinical trials. , 1991, Statistics in medicine.

[9]  D. Metzger,et al.  The Fifth Edition of the Addiction Severity Index. , 1992, Journal of substance abuse treatment.

[10]  D. Hedeker,et al.  A random-effects ordinal regression model for multilevel analysis. , 1994, Biometrics.

[11]  J. Lachin Distribution-Free Marginal Analysis of Repeated Measures , 1996 .

[12]  Z. Feng,et al.  A comparison of statistical methods for clustered data analysis with Gaussian error. , 1996, Statistics in medicine.

[13]  M. Sherman,et al.  A comparison between bootstrap methods and generalized estimating equations for correlated outcomes in generalized linear models , 1997 .

[14]  J. Lachin Group sequential monitoring of distribution-free analyses of repeated measures. , 1997, Statistics in medicine.

[15]  E. Edgington,et al.  Randomization Tests (3rd ed.) , 1998 .

[16]  John Ludbrook,et al.  Why Permutation Tests are Superior to t and F Tests in Biomedical Research , 1998 .

[17]  B. Everitt,et al.  Analysis of longitudinal data , 1998, British Journal of Psychiatry.

[18]  J. McKay,et al.  Test-retest reliability of the lifetime items on the addiction severity Index , 1999 .

[19]  K. Delucchi,et al.  Small sample longitudinal clinical trial with missing data: A comparison of analytic methods. , 1999 .

[20]  David J. Groggel,et al.  Practical Nonparametric Statistics , 2000, Technometrics.

[21]  V W Berger,et al.  Pros and cons of permutation tests in clinical trials. , 2000, Statistics in medicine.

[22]  R. Moos,et al.  Consistency of self-administered and interview-based Addiction Severity Index composite scores. , 2000, Addiction.

[23]  D. Gastfriend,et al.  The Addiction Severity Index: a field study of internal consistency and validity. , 2000, Journal of substance abuse treatment.

[24]  H J Keselman,et al.  Testing treatment effects in repeated measures designs: trimmed means and bootstrapping. , 2000, The British journal of mathematical and statistical psychology.

[25]  H. W. Clark,et al.  Methadone maintenance vs 180-day psychosocially enriched detoxification for treatment of opioid dependence: a randomized controlled trial. , 2000, JAMA.

[26]  D. Hedeker,et al.  Statistical analysis of randomized trials in tobacco treatment: longitudinal designs with dichotomous outcome. , 2001, Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco.

[27]  P A Lachenbruch,et al.  Comparisons of two‐part models with competitors , 2001, Statistics in medicine.

[28]  Joseph L Schafer,et al.  A Two-Part Random-Effects Model for Semicontinuous Longitudinal Data , 2001 .

[29]  P. Lachenbruch Power and sample size requirements for two‐part models , 2001, Statistics in medicine.

[30]  Ana Ivelisse Avilés,et al.  Linear Mixed Models for Longitudinal Data , 2001, Technometrics.

[31]  H. Keselman,et al.  The analysis of repeated measures designs: a review. , 2001, The British journal of mathematical and statistical psychology.

[32]  Gary K Grunwald,et al.  Analysis of repeated measures data with clumping at zero , 2002, Statistical methods in medical research.

[33]  Berk Kn,et al.  Repeated measures with zeros. , 2002 .

[34]  K. Berk,et al.  Repeated measures with zeros , 2002, Statistical methods in medical research.

[35]  A. Mclellan,et al.  Is the Addiction Severity Index a Reliable and Valid Assessment Instrument Among Clients with Severe and Persistent Mental Illness and Substance Abuse Disorders? , 1997, Community mental health journal.