论文信息 - PEARSON'S VERSUS SPEARMAN'S AND KENDALL'S CORRELATION COEFFICIENTS FOR CONTINUOUS DATA

PEARSON'S VERSUS SPEARMAN'S AND KENDALL'S CORRELATION COEFFICIENTS FOR CONTINUOUS DATA

The association between two variables is often of interest in data analysis and methodological research. Pearson's, Spearman's and Kendall's correlation coefficients are the most commonly used measures of monotone association, with the latter two usually suggested for non-normally distributed data. These three correlation coefficients can be represented as the differently weighted averages of the same concordance indicators. The weighting used in the Pearson's correlation coefficient could be preferable for reflecting monotone association in some types of continuous and not necessarily bivariate normal data.In this work, I investigate the intrinsic ability of Pearson's, Spearman's and Kendall's correlation coefficients to affect the statistical power of tests for monotone association in continuous data. This investigation is important in many fields including Public Health, since it can lead to guidelines that help save health research resources by reducing the number of inconclusive studies and enabling design of powerful studies with smaller sample sizes.The statistical power can be affected by both the structure of the employed correlation coefficient and type of a test statistic. Hence, I standardize the comparison of the intrinsic properties of the correlation coefficients by using a permutation test that is applicable to all of them. In the simulation study, I consider four types of continuous bivariate distributions composed of pairs of normal, log-normal, double exponential and t distributions. These distributions enable modeling the scenarios with different degrees of violation of normality with respect to skewness and kurtosis.As a result of the simulation study, I demonstrate that the Pearson's correlation coefficient could offer a substantial improvement in statistical power even for distributions with moderate skewness or excess kurtosis. Nonetheless, because of its known sensitivity to outliers, Pearson's correlation leads to a less powerful statistical test for distributions with extreme skewness or excess of kurtosis (where the datasets with outliers are more likely). In conclusion, the results of my investigation indicate that the Pearson's correlation coefficient could have significant advantages for continuous non-normal data which does not have obvious outliers. Thus, the shape of the distribution should not be a sole reason for not using the Pearson product moment correlation coefficient.

Nian Shong Chok

[1] N. Balakrishnan,et al. Continuous Bivariate Distributions , 2009 .

[2] Harry J. Khamis,et al. Measures of Association: How to Choose? , 2008 .

[3] E. Corty. Using and interpreting statistics : a practical text for the health, behavioral, and social sciences , 2007 .

[4] S. McKillup. Statistics explained :an introductory guide for life sciences , 2006 .

[5] M. D. Ernst. Permutation Methods: A Basis for Exact Inference , 2004 .

[6] Samuel Kotz,et al. Multivariate T-Distributions and Their Applications , 2004 .

[7] Peter Y. Chen,et al. Correlation: Parametric and Nonparametric Measures , 2002 .

[8] Maliha S. Nash,et al. Handbook of Parametric and Nonparametric Statistical Procedures , 2001, Technometrics.

[9] R. Strawderman. Continuous Multivariate Distributions, Volume 1: Models and Applications , 2001 .

[10] Samuel Kotz,et al. The Laplace Distribution and Generalizations: A Revisit with Applications to Communications, Economics, Engineering, and Finance , 2001 .

[11] L. T. DeCarlo. On the meaning and use of kurtosis. , 1997 .